1. 13 7月, 2018 1 次提交
    • A
      tcp: use monotonic timestamps for PAWS · cca9bab1
      Arnd Bergmann 提交于
      Using get_seconds() for timestamps is deprecated since it can lead
      to overflows on 32-bit systems. While the interface generally doesn't
      overflow until year 2106, the specific implementation of the TCP PAWS
      algorithm breaks in 2038 when the intermediate signed 32-bit timestamps
      overflow.
      
      A related problem is that the local timestamps in CLOCK_REALTIME form
      lead to unexpected behavior when settimeofday is called to set the system
      clock backwards or forwards by more than 24 days.
      
      While the first problem could be solved by using an overflow-safe method
      of comparing the timestamps, a nicer solution is to use a monotonic
      clocksource with ktime_get_seconds() that simply doesn't overflow (at
      least not until 136 years after boot) and that doesn't change during
      settimeofday().
      
      To make 32-bit and 64-bit architectures behave the same way here, and
      also save a few bytes in the tcp_options_received structure, I'm changing
      the type to a 32-bit integer, which is now safe on all architectures.
      
      Finally, the ts_recent_stamp field also (confusingly) gets used to store
      a jiffies value in tcp_synq_overflow()/tcp_synq_no_recent_overflow().
      This is currently safe, but changing the type to 32-bit requires
      some small changes there to keep it working.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cca9bab1
  2. 12 7月, 2018 2 次提交
    • P
      net: Add lag.h, net_lag_port_dev_txable() · eeed992b
      Petr Machata 提交于
      LAG devices (team or bond) recognize for each one of their slave devices
      whether LAG traffic is going to be sent through that device. Bond calls
      such devices "active", team calls them "txable". When this state
      changes, a NETDEV_CHANGELOWERSTATE notification is distributed, together
      with a netdev_notifier_changelowerstate_info structure that for LAG
      devices includes a tx_enabled flag that refers to the new state. The
      notification thus makes it possible to react to the changes in txability
      in drivers.
      
      However there's no way to query txability from the outside on demand.
      That is problematic namely for mlxsw, which when resolving ERSPAN packet
      path, may encounter a LAG device, and needs to determine which of the
      slaves it should choose.
      
      To that end, introduce a new function, net_lag_port_dev_txable(), which
      determines whether a given slave device is "active" or
      "txable" (depending on the flavor of the LAG device). That function then
      dispatches to per-LAG-flavor helpers, bond_is_active_slave_dev() resp.
      team_port_dev_txable().
      
      Because there currently is no good place where net_lag_port_dev_txable()
      should be added, introduce a new header file, lag.h, which should from
      now on hold any logic common to both team and bond. (But keep
      netif_is_lag_master() together with the rest of netif_is_*_master()
      functions).
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eeed992b
    • D
      tcp: expose both send and receive intervals for rate sample · 4929c942
      Deepti Raghavan 提交于
      Congestion control algorithms, which access the rate sample
      through the tcp_cong_control function, only have access to the maximum
      of the send and receive interval, for cases where the acknowledgment
      rate may be inaccurate due to ACK compression or decimation. Algorithms
      may want to use send rates and receive rates as separate signals.
      Signed-off-by: NDeepti Raghavan <deeptir@mit.edu>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4929c942
  3. 08 7月, 2018 7 次提交
  4. 07 7月, 2018 8 次提交
  5. 06 7月, 2018 1 次提交
  6. 05 7月, 2018 6 次提交
  7. 04 7月, 2018 8 次提交
    • J
      net/sched: Make etf report drops on error_queue · 4b15c707
      Jesus Sanchez-Palencia 提交于
      Use the socket error queue for reporting dropped packets if the
      socket has enabled that feature through the SO_TXTIME API.
      
      Packets are dropped either on enqueue() if they aren't accepted by the
      qdisc or on dequeue() if the system misses their deadline. Those are
      reported as different errors so applications can react accordingly.
      
      Userspace can retrieve the errors through the socket error queue and the
      corresponding cmsg interfaces. A struct sock_extended_err* is used for
      returning the error data, and the packet's timestamp can be retrieved by
      adding both ee_data and ee_info fields as e.g.:
      
          ((__u64) serr->ee_data << 32) + serr->ee_info
      
      This feature is disabled by default and must be explicitly enabled by
      applications. Enabling it can bring some overhead for the Tx cycles
      of the application.
      Signed-off-by: NJesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b15c707
    • J
      net/sched: Add HW offloading capability to ETF · 88cab771
      Jesus Sanchez-Palencia 提交于
      Add infra so etf qdisc supports HW offload of time-based transmission.
      
      For hw offload, the time sorted list is still used, so packets are
      dequeued always in order of txtime.
      
      Example:
      
      $ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
                 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
      
      $ tc qdisc add dev enp2s0 parent 100:1 etf offload delta 100000 \
      	   clockid CLOCK_REALTIME
      
      In this example, the Qdisc will use HW offload for the control of the
      transmission time through the network adapter. The hrtimer used for
      packets scheduling inside the qdisc will use the clockid CLOCK_REALTIME
      as reference and packets leave the Qdisc "delta" (100000) nanoseconds
      before their transmission time. Because this will be using HW offload and
      since dynamic clocks are not supported by the hrtimer, the system clock
      and the PHC clock must be synchronized for this mode to behave as
      expected.
      Signed-off-by: NJesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88cab771
    • V
      net/sched: Allow creating a Qdisc watchdog with other clocks · 860b642b
      Vinicius Costa Gomes 提交于
      This adds 'qdisc_watchdog_init_clockid()' that allows a clockid to be
      passed, this allows other time references to be used when scheduling
      the Qdisc to run.
      Signed-off-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      860b642b
    • J
      net: ipv4: Hook into time based transmission · bc969a97
      Jesus Sanchez-Palencia 提交于
      Add a transmit_time field to struct inet_cork, then copy the
      timestamp from the CMSG cookie at ip_setup_cork() so we can
      safely copy it into the skb later during __ip_make_skb().
      
      For the raw fast path, just perform the copy at raw_send_hdrinc().
      Signed-off-by: NRichard Cochran <rcochran@linutronix.de>
      Signed-off-by: NJesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc969a97
    • R
      net: Add a new socket option for a future transmit time. · 80b14dee
      Richard Cochran 提交于
      This patch introduces SO_TXTIME. User space enables this option in
      order to pass a desired future transmit time in a CMSG when calling
      sendmsg(2). The argument to this socket option is a 8-bytes long struct
      provided by the uapi header net_tstamp.h defined as:
      
      struct sock_txtime {
      	clockid_t 	clockid;
      	u32		flags;
      };
      
      Note that new fields were added to struct sock by filling a 2-bytes
      hole found in the struct. For that reason, neither the struct size or
      number of cachelines were altered.
      Signed-off-by: NRichard Cochran <rcochran@linutronix.de>
      Signed-off-by: NJesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80b14dee
    • E
      net: ipv4: listified version of ip_rcv · 17266ee9
      Edward Cree 提交于
      Also involved adding a way to run a netfilter hook over a list of packets.
       Rather than attempting to make netfilter know about lists (which would be
       a major project in itself) we just let it call the regular okfn (in this
       case ip_rcv_finish()) for any packets it steals, and have it give us back
       a list of packets it's synchronously accepted (which normally NF_HOOK
       would automatically call okfn() on, but we want to be able to potentially
       pass the list to a listified version of okfn().)
      The netfilter hooks themselves are indirect calls that still happen per-
       packet (see nf_hook_entry_hookfn()), but again, changing that can be left
       for future work.
      
      There is potential for out-of-order receives if the netfilter hook ends up
       synchronously stealing packets, as they will be processed before any
       accepts earlier in the list.  However, it was already possible for an
       asynchronous accept to cause out-of-order receives, so presumably this is
       considered OK.
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17266ee9
    • X
      sctp: add support for dscp and flowlabel per transport · 8a9c58d2
      Xin Long 提交于
      Like some other per transport params, flowlabel and dscp are added
      in transport, asoc and sctp_sock. By default, transport sets its
      value from asoc's, and asoc does it from sctp_sock. flowlabel
      only works for ipv6 transport.
      
      Other than that they need to be passed down in sctp_xmit, flow4/6
      also needs to set them before looking up route in get_dst.
      
      Note that it uses '& 0x100000' to check if flowlabel is set and
      '& 0x1' (tos 1st bit is unused) to check if dscp is set by users,
      so that they could be set to 0 by sockopt in next patch.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a9c58d2
    • X
      ipv4: add __ip_queue_xmit() that supports tos param · 69b9e1e0
      Xin Long 提交于
      This patch introduces __ip_queue_xmit(), through which the callers
      can pass tos param into it without having to set inet->tos. For
      ipv6, ip6_xmit() already allows passing tclass parameter.
      
      It's needed when some transport protocol doesn't use inet->tos,
      like sctp's per transport dscp, which will be added in next patch.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69b9e1e0
  8. 02 7月, 2018 3 次提交
  9. 30 6月, 2018 3 次提交
    • H
      net/smc: add pnetid support for SMC-D and ISM · 1619f770
      Hans Wippel 提交于
      SMC-D relies on PNETIDs to find usable SMC-D/ISM devices for a SMC
      connection. This patch adds SMC-D/ISM support to the current PNETID
      implementation.
      Signed-off-by: NHans Wippel <hwippel@linux.ibm.com>
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Suggested-by: NThomas Richter <tmricht@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1619f770
    • H
      net/smc: add base infrastructure for SMC-D and ISM · c6ba7c9b
      Hans Wippel 提交于
      SMC supports two variants: SMC-R and SMC-D. For data transport, SMC-R
      uses RDMA devices, SMC-D uses so-called Internal Shared Memory (ISM)
      devices. An ISM device only allows shared memory communication between
      SMC instances on the same machine. For example, this allows virtual
      machines on the same host to communicate via SMC without RDMA devices.
      
      This patch adds the base infrastructure for SMC-D and ISM devices to
      the existing SMC code. It contains the following:
      
      * ISM driver interface:
        This interface allows an ISM driver to register ISM devices in SMC. In
        the process, the driver provides a set of device ops for each device.
        SMC uses these ops to execute SMC specific operations on or transfer
        data over the device.
      
      * Core SMC-D link group, connection, and buffer support:
        Link groups, SMC connections and SMC buffers (in smc_core) are
        extended to support SMC-D.
      
      * SMC type checks:
        Some type checks are added to prevent using SMC-R specific code for
        SMC-D and vice versa.
      
      To actually use SMC-D, additional changes to pnetid, CLC, CDC, etc. are
      required. These are added in follow-up patches.
      Signed-off-by: NHans Wippel <hwippel@linux.ibm.com>
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Suggested-by: NThomas Richter <tmricht@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6ba7c9b
    • U
      net/smc: add pnetid support · 0afff91c
      Ursula Braun 提交于
      s390 hardware supports the definition of a so-call Physical NETwork
      IDentifier (short PNETID) per network device port. These PNETIDS
      can be used to identify network devices that are attached to the same
      physical network (broadcast domain).
      
      On s390 try to use the PNETID of the ethernet device port used for
      initial connecting, and derive the IB device port used for SMC RDMA
      traffic.
      
      On platforms without PNETID support fall back to the existing
      solution of a configured pnet table.
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0afff91c
  10. 29 6月, 2018 1 次提交