1. 11 11月, 2018 2 次提交
  2. 04 11月, 2018 1 次提交
    • E
      net: bql: add __netdev_tx_sent_queue() · 3e59020a
      Eric Dumazet 提交于
      When qdisc_run() tries to use BQL budget to bulk-dequeue a batch
      of packets, GSO can later transform this list in another list
      of skbs, and each skb is sent to device ndo_start_xmit(),
      one at a time, with skb->xmit_more being set to one but
      for last skb.
      
      Problem is that very often, BQL limit is hit in the middle of
      the packet train, forcing dev_hard_start_xmit() to stop the
      bulk send and requeue the end of the list.
      
      BQL role is to avoid head of line blocking, making sure
      a qdisc can deliver high priority packets before low priority ones.
      
      But there is no way requeued packets can be bypassed by fresh
      packets in the qdisc.
      
      Aborting the bulk send increases TX softirqs, and hot cache
      lines (after skb_segment()) are wasted.
      
      Note that for TSO packets, we never split a packet in the middle
      because of BQL limit being hit.
      
      Drivers should be able to update BQL counters without
      flipping/caring about BQL status, if the current skb
      has xmit_more set.
      
      Upper layers are ultimately responsible to stop sending another
      packet train when BQL limit is hit.
      
      Code template in a driver might look like the following :
      
      	send_doorbell = __netdev_tx_sent_queue(tx_queue, nr_bytes, skb->xmit_more);
      
      Note that __netdev_tx_sent_queue() use is not mandatory,
      since following patch will change dev_hard_start_xmit()
      to not care about BQL status.
      
      But it is highly recommended so that xmit_more full benefits
      can be reached (less doorbells sent, and less atomic operations as well)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e59020a
  3. 16 10月, 2018 1 次提交
    • M
      FDDI: defza: Support capturing outgoing SMT traffic · 9f9a742d
      Maciej W. Rozycki 提交于
      DEC FDDIcontroller 700 (DEFZA) uses a Tx/Rx queue pair to communicate
      SMT frames with adapter's firmware.  Any SMT frame received from the RMC
      via the Rx queue is queued back by the driver to the SMT Rx queue for
      the firmware to process.  Similarly the firmware uses the SMT Tx queue
      to supply the driver with SMT frames which are queued back to the Tx
      queue for the RMC to send to the ring.
      
      When a network tap is attached to an FDDI interface handled by `defza'
      any incoming SMT frames captured are queued to our usual processing of
      network data received, which in turn delivers them to any listening
      taps.
      
      However the outgoing SMT frames produced by the firmware bypass our
      network protocol stack and are therefore not delivered to taps.  This in
      turn means that taps are missing a part of network traffic sent by the
      adapter, which may make it more difficult to track down network problems
      or do general traffic analysis.
      
      Call `dev_queue_xmit_nit' then in the SMT Tx path, having checked that
      a network tap is attached, with a newly-created `dev_nit_active' helper
      wrapping the usual condition used in the transmit path.
      Signed-off-by: NMaciej W. Rozycki <macro@linux-mips.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f9a742d
  4. 11 10月, 2018 1 次提交
    • S
      net: ipv4: update fnhe_pmtu when first hop's MTU changes · af7d6cce
      Sabrina Dubroca 提交于
      Since commit 5aad1de5 ("ipv4: use separate genid for next hop
      exceptions"), exceptions get deprecated separately from cached
      routes. In particular, administrative changes don't clear PMTU anymore.
      
      As Stefano described in commit e9fa1495 ("ipv6: Reflect MTU changes
      on PMTU of exceptions for MTU-less routes"), the PMTU discovered before
      the local MTU change can become stale:
       - if the local MTU is now lower than the PMTU, that PMTU is now
         incorrect
       - if the local MTU was the lowest value in the path, and is increased,
         we might discover a higher PMTU
      
      Similarly to what commit e9fa1495 did for IPv6, update PMTU in those
      cases.
      
      If the exception was locked, the discovered PMTU was smaller than the
      minimal accepted PMTU. In that case, if the new local MTU is smaller
      than the current PMTU, let PMTU discovery figure out if locking of the
      exception is still needed.
      
      To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU
      notifier. By the time the notifier is called, dev->mtu has been
      changed. This patch adds the old MTU as additional information in the
      notifier structure, and a new call_netdevice_notifiers_u32() function.
      
      Fixes: 5aad1de5 ("ipv4: use separate genid for next hop exceptions")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af7d6cce
  5. 05 10月, 2018 1 次提交
  6. 27 9月, 2018 1 次提交
  7. 19 9月, 2018 1 次提交
  8. 14 9月, 2018 1 次提交
  9. 06 9月, 2018 1 次提交
    • V
      packet: add sockopt to ignore outgoing packets · fa788d98
      Vincent Whitchurch 提交于
      Currently, the only way to ignore outgoing packets on a packet socket is
      via the BPF filter.  With MSG_ZEROCOPY, packets that are looped into
      AF_PACKET are copied in dev_queue_xmit_nit(), and this copy happens even
      if the filter run from packet_rcv() would reject them.  So the presence
      of a packet socket on the interface takes away the benefits of
      MSG_ZEROCOPY, even if the packet socket is not interested in outgoing
      packets.  (Even when MSG_ZEROCOPY is not used, the skb is unnecessarily
      cloned, but the cost for that is much lower.)
      
      Add a socket option to allow AF_PACKET sockets to ignore outgoing
      packets to solve this.  Note that the *BSDs already have something
      similar: BIOCSSEESENT/BIOCSDIRECTION and BIOCSDIRFILT.
      
      The first intended user is lldpd.
      Signed-off-by: NVincent Whitchurch <vincent.whitchurch@axis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa788d98
  10. 30 8月, 2018 1 次提交
  11. 11 8月, 2018 1 次提交
  12. 01 8月, 2018 2 次提交
  13. 30 7月, 2018 1 次提交
  14. 17 7月, 2018 1 次提交
    • L
      net: convert gro_count to bitmask · d9f37d01
      Li RongQing 提交于
      gro_hash size is 192 bytes, and uses 3 cache lines, if there is few
      flows, gro_hash may be not fully used, so it is unnecessary to iterate
      all gro_hash in napi_gro_flush(), to occupy unnecessary cacheline.
      
      convert gro_count to a bitmask, and rename it as gro_bitmask, each bit
      represents a element of gro_hash, only flush a gro_hash element if the
      related bit is set, to speed up napi_gro_flush().
      
      and update gro_bitmask only if it will be changed, to reduce cache
      update
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NLi RongQing <lirongqing@baidu.com>
      Cc: Stefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9f37d01
  15. 16 7月, 2018 1 次提交
  16. 14 7月, 2018 2 次提交
  17. 10 7月, 2018 5 次提交
  18. 05 7月, 2018 1 次提交
  19. 04 7月, 2018 3 次提交
    • V
      net/sched: Introduce the ETF Qdisc · 25db26a9
      Vinicius Costa Gomes 提交于
      The ETF (Earliest TxTime First) qdisc uses the information added
      earlier in this series (the socket option SO_TXTIME and the new
      role of sk_buff->tstamp) to schedule packets transmission based
      on absolute time.
      
      For some workloads, just bandwidth enforcement is not enough, and
      precise control of the transmission of packets is necessary.
      
      Example:
      
      $ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
                 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
      
      $ tc qdisc add dev enp2s0 parent 100:1 etf delta 100000 \
                 clockid CLOCK_TAI
      
      In this example, the Qdisc will provide SW best-effort for the control
      of the transmission time to the network adapter, the time stamp in the
      socket will be in reference to the clockid CLOCK_TAI and packets
      will leave the qdisc "delta" (100000) nanoseconds before its transmission
      time.
      
      The ETF qdisc will buffer packets sorted by their txtime. It will drop
      packets on enqueue() if their skbuff clockid does not match the clock
      reference of the Qdisc. Moreover, on dequeue(), a packet will be dropped
      if it expires while being enqueued.
      
      The qdisc also supports the SO_TXTIME deadline mode. For this mode, it
      will dequeue a packet as soon as possible and change the skb timestamp
      to 'now' during etf_dequeue().
      
      Note that both the qdisc's and the SO_TXTIME ABIs allow for a clockid
      to be configured, but it's been decided that usage of CLOCK_TAI should
      be enforced until we decide to allow for other clockids to be used.
      The rationale here is that PTP times are usually in the TAI scale, thus
      no other clocks should be necessary. For now, the qdisc will return
      EINVAL if any clocks other than CLOCK_TAI are used.
      Signed-off-by: NJesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
      Signed-off-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25db26a9
    • E
      net: ipv4: listified version of ip_rcv · 17266ee9
      Edward Cree 提交于
      Also involved adding a way to run a netfilter hook over a list of packets.
       Rather than attempting to make netfilter know about lists (which would be
       a major project in itself) we just let it call the regular okfn (in this
       case ip_rcv_finish()) for any packets it steals, and have it give us back
       a list of packets it's synchronously accepted (which normally NF_HOOK
       would automatically call okfn() on, but we want to be able to potentially
       pass the list to a listified version of okfn().)
      The netfilter hooks themselves are indirect calls that still happen per-
       packet (see nf_hook_entry_hookfn()), but again, changing that can be left
       for future work.
      
      There is potential for out-of-order receives if the netfilter hook ends up
       synchronously stealing packets, as they will be processed before any
       accepts earlier in the list.  However, it was already possible for an
       asynchronous accept to cause out-of-order receives, so presumably this is
       considered OK.
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17266ee9
    • E
      net: core: trivial netif_receive_skb_list() entry point · f6ad8c1b
      Edward Cree 提交于
      Just calls netif_receive_skb() in a loop.
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6ad8c1b
  20. 02 7月, 2018 2 次提交
  21. 26 6月, 2018 2 次提交
  22. 05 6月, 2018 3 次提交
  23. 03 6月, 2018 1 次提交
  24. 29 5月, 2018 2 次提交
  25. 25 5月, 2018 2 次提交
    • J
      net: include hash policy in LAG changeupper info · f44aa9ef
      John Hurley 提交于
      LAG upper event notifiers contain the tx type used by the LAG device.
      Extend this to also include the hash policy used for tx types that
      utilize hashing.
      Signed-off-by: NJohn Hurley <john.hurley@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f44aa9ef
    • J
      xdp: change ndo_xdp_xmit API to support bulking · 735fc405
      Jesper Dangaard Brouer 提交于
      This patch change the API for ndo_xdp_xmit to support bulking
      xdp_frames.
      
      When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown.
      Most of the slowdown is caused by DMA API indirect function calls, but
      also the net_device->ndo_xdp_xmit() call.
      
      Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with
      single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed
      performance improved:
       for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps
       for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps
      
      With frames avail as a bulk inside the driver ndo_xdp_xmit call,
      further optimizations are possible, like bulk DMA-mapping for TX.
      
      Testing without CONFIG_RETPOLINE show the same performance for
      physical NIC drivers.
      
      The virtual NIC driver tun sees a huge performance boost, as it can
      avoid doing per frame producer locking, but instead amortize the
      locking cost over the bulk.
      
      V2: Fix compile errors reported by kbuild test robot <lkp@intel.com>
      V4: Isolated ndo, driver changes and callers.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      735fc405