1. 16 11月, 2018 1 次提交
  2. 15 11月, 2018 4 次提交
  3. 13 11月, 2018 3 次提交
  4. 12 11月, 2018 10 次提交
    • E
      net_sched: sch_fq: add dctcp-like marking · 48872c11
      Eric Dumazet 提交于
      Similar to 80ba92fa ("codel: add ce_threshold attribute")
      
      After EDT adoption, it became easier to implement DCTCP-like CE marking.
      
      In many cases, queues are not building in the network fabric but on
      the hosts themselves.
      
      If packets leaving fq missed their Earliest Departure Time by XXX usec,
      we mark them with ECN CE. This gives a feedback (after one RTT) to
      the sender to slow down and find better operating mode.
      
      Example :
      
      tc qd replace dev eth0 root fq ce_threshold 2.5ms
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48872c11
    • E
      tcp: tsq: no longer use limit_output_bytes for paced flows · c73e5807
      Eric Dumazet 提交于
      FQ pacing guarantees that paced packets queued by one flow do not
      add head-of-line blocking for other flows.
      
      After TCP GSO conversion, increasing limit_output_bytes to 1 MB is safe,
      since this maps to 16 skbs at most in qdisc or device queues.
      (or slightly more if some drivers lower {gso_max_segs|size})
      
      We still can queue at most 1 ms worth of traffic (this can be scaled
      by wifi drivers if they need to)
      
      Tested:
      
      # ethtool -c eth0 | egrep "tx-usecs:|tx-frames:" # 40 Gbit mlx4 NIC
      tx-usecs: 16
      tx-frames: 16
      # tc qdisc replace dev eth0 root fq
      # for f in {1..10};do netperf -P0 -H lpaa24,6 -o THROUGHPUT;done
      
      Before patch:
      27711
      26118
      27107
      27377
      27712
      27388
      27340
      27117
      27278
      27509
      
      After patch:
      37434
      36949
      36658
      36998
      37711
      37291
      37605
      36659
      36544
      37349
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c73e5807
    • E
      tcp: get rid of tcp_tso_should_defer() dependency on HZ/jiffies · a682850a
      Eric Dumazet 提交于
      tcp_tso_should_defer() first heuristic is to not defer
      if last send is "old enough".
      
      Its current implementation uses jiffies and its low granularity.
      
      TSO autodefer performance should not rely on kernel HZ :/
      
      After EDT conversion, we have state variables in nanoseconds that
      can allow us to properly implement the heuristic.
      
      This patch increases TSO chunk sizes on medium rate flows,
      especially when receivers do not use GRO or similar aggregation.
      
      It also reduces bursts for HZ=100 or HZ=250 kernels, making TCP
      behavior more uniform.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a682850a
    • E
      tcp: refine tcp_tso_should_defer() after EDT adoption · f1c6ea38
      Eric Dumazet 提交于
      tcp_tso_should_defer() last step tries to check if the probable
      next ACK packet is coming in less than half rtt.
      
      Problem is that the head->tstamp might be in the future,
      so we need to use signed arithmetics to avoid overflows.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1c6ea38
    • E
      tcp: do not try to defer skbs with eor mark (MSG_EOR) · 1c09f7d0
      Eric Dumazet 提交于
      Applications using MSG_EOR are giving a strong hint to TCP stack :
      
      Subsequent sendmsg() can not append more bytes to skbs having
      the EOR mark.
      
      Do not try to TSO defer suchs skbs, there is really no hope.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c09f7d0
    • Y
      tcp: minor optimization in tcp ack fast path processing · 5e13a0d3
      Yafang Shao 提交于
      Bitwise operation is a little faster.
      So I replace after() with using the flag FLAG_SND_UNA_ADVANCED as it is
      already set before.
      
      In addtion, there's another similar improvement in tcp_cwnd_reduction().
      
      Cc: Joe Perches <joe@perches.com>
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e13a0d3
    • E
      act_mirred: clear skb->tstamp on redirect · 7236ead1
      Eric Dumazet 提交于
      If sch_fq is used at ingress, skbs that might have been
      timestamped by net_timestamp_set() if a packet capture
      is requesting timestamps could be delayed by arbitrary
      amount of time, since sch_fq time base is MONOTONIC.
      
      Fix this problem by moving code from sch_netem.c to act_mirred.c.
      
      Fixes: fb420d5d ("tcp/fq: move back to CLOCK_MONOTONIC")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7236ead1
    • J
      tipc: fix link re-establish failure · 7ab412d3
      Jon Maloy 提交于
      When a link failure is detected locally, the link is reset, the flag
      link->in_session is set to false, and a RESET_MSG with the 'stopping'
      bit set is sent to the peer.
      
      The purpose of this bit is to inform the peer that this endpoint just
      is going down, and that the peer should handle the reception of this
      particular RESET message as a local failure. This forces the peer to
      accept another RESET or ACTIVATE message from this endpoint before it
      can re-establish the link. This again is necessary to ensure that
      link session numbers are properly exchanged before the link comes up
      again.
      
      If a failure is detected locally at the same time at the peer endpoint
      this will do the same, which is also a correct behavior.
      
      However, when receiving such messages, the endpoints will not
      distinguish between 'stopping' RESETs and ordinary ones when it comes
      to updating session numbers. Both endpoints will copy the received
      session number and set their 'in_session' flags to true at the
      reception, while they are still expecting another RESET from the
      peer before they can go ahead and re-establish. This is contradictory,
      since, after applying the validation check referred to below, the
      'in_session' flag will cause rejection of all such messages, and the
      link will never come up again.
      
      We now fix this by not only handling received RESET/STOPPING messages
      as a local failure, but also by omitting to set a new session number
      and the 'in_session' flag in such cases.
      
      Fixes: 7ea817f4 ("tipc: check session number before accepting link protocol messages")
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ab412d3
    • L
      tipc: improve broadcast retransmission algorithm · 31c4f4cc
      LUU Duc Canh 提交于
      Currently, the broadcast retransmission algorithm is using the
      'prev_retr' field in struct tipc_link to time stamp the latest broadcast
      retransmission occasion. This helps to restrict retransmission of
      individual broadcast packets to max once per 10 milliseconds, even
      though all other criteria for retransmission are met.
      
      We now move this time stamp to the control block of each individual
      packet, and remove other limiting criteria. This simplifies the
      retransmission algorithm, and eliminates any risk of logical errors
      in selecting which packets can be retransmitted.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NLUU Duc Canh <canh.d.luu@dektech.com.au>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31c4f4cc
    • J
      net: sched: register callbacks for indirect tc block binds · 7f76fa36
      John Hurley 提交于
      Currently drivers can register to receive TC block bind/unbind callbacks
      by implementing the setup_tc ndo in any of their given netdevs. However,
      drivers may also be interested in binds to higher level devices (e.g.
      tunnel drivers) to potentially offload filters applied to them.
      
      Introduce indirect block devs which allows drivers to register callbacks
      for block binds on other devices. The callback is triggered when the
      device is bound to a block, allowing the driver to register for rules
      applied to that block using already available functions.
      
      Freeing an indirect block callback will trigger an unbind event (if
      necessary) to direct the driver to remove any offloaded rules and unreg
      any block rule callbacks. It is the responsibility of the implementing
      driver to clean any registered indirect block callbacks before exiting,
      if the block it still active at such a time.
      
      Allow registering an indirect block dev callback for a device that is
      already bound to a block. In this case (if it is an ingress block),
      register and also trigger the callback meaning that any already installed
      rules can be replayed to the calling driver.
      Signed-off-by: NJohn Hurley <john.hurley@netronome.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f76fa36
  5. 11 11月, 2018 5 次提交
  6. 10 11月, 2018 3 次提交
  7. 09 11月, 2018 14 次提交