1. 22 7月, 2018 16 次提交
    • D
      Merge branch 'tcp-improve-setsockopt-TCP_USER_TIMEOUT-accuracy' · d1afdc51
      David S. Miller 提交于
      Jon Maxwell says:
      
      ====================
      tcp: improve setsockopt() TCP_USER_TIMEOUT accuracy
      
      The patch was becoming bigger based on feedback therefore I have
      implemented a series of 3 commits instead in V4.
      
      This series is a continuation based on V3 here and associated feedback:
      
      https://patchwork.kernel.org/patch/10516195/
      
      Suggestions by Neal Cardwell:
      
      1) Fix up units mismatch regarding msec/jiffies.
      2) Address possiblility of time_remaining being negative.
      3) Add a helper routine tcp_clamp_rto_to_user_timeout() to do the rto
      calculation.
      4) Move start_ts logic into helper routine tcp_retrans_stamp() to
      validate tcp_sk(sk)->retrans_stamp.
      5) Some u32 declation and return refactoring.
      6) Return 0 instead of false in tcp_retransmit_stamp(), it's not a bool.
      
      Suggestions by David Laight:
      
      1) Don't cache rto in tcp_clamp_rto_to_user_timeout().
      
      Suggestions by Eric Dumazet:
      
      1) Make u32 declartions consistent.
      2) Use patch series for easier review.
      3) Convert icsk->icsk_user_timeout to millisconds to avoid jiffie to
      msec dance.
      4) Use seperate titles for each commit in the series.
      5) Fix fuzzy indentation and line wrap issues.
      6) Make commit titles descriptive.
      
      Changes:
      
      1) Call tcp_clamp_rto_to_user_timeout(sk) as an argument to
      inet_csk_reset_xmit_timer() to save on rto declaration.
      
      Every time the TCP retransmission timer fires. It checks to see if
      there is a timeout before scheduling the next retransmit timer. The
      retransmit interval between each retransmission increases
      exponentially. The issue is that in order for the timeout to occur the
      retransmit timer needs to fire again. If the user timeout check happens
      after the 9th retransmit for example. It needs to wait for the 10th
      retransmit timer to fire in order to evaluate whether a timeout has
      occurred or not. If the interval is large enough then the timeout will
      be inaccurate.
      
      For example with a TCP_USER_TIMEOUT of 10 seconds without patch:
      
      1st retransmit:
      
      22:25:18.973488 IP host1.49310 > host2.search-agent: Flags [.]
      
      Last retransmit:
      
      22:25:26.205499 IP host1.49310 > host2.search-agent: Flags [.]
      
      Timeout:
      
      send: Connection timed out
      Sun Jul  1 22:25:34 EDT 2018
      
      We can see that last retransmit took ~7 seconds. Which pushed the total
      timeout to ~15 seconds instead of the expected 10 seconds. This gets
      more inaccurate the larger the TCP_USER_TIMEOUT value. As the interval
      increases.
      
      Add tcp_clamp_rto_to_user_timeout() to determine if the user rto has
      expired. Or whether the rto interval needs to be recalculated. Use the
      original interval if user rto is not set.
      
      Test results with the patch is the expected 10 second timeout:
      
      1st retransmit:
      
      01:37:59.022555 IP host1.49310 > host2.search-agent: Flags [.]
      
      Last retransmit:
      
      01:38:06.486558 IP host1.49310 > host2.search-agent: Flags [.]
      
      Timeout:
      
      send: Connection timed out
      Mon Jul  2 01:38:09 EDT 2018
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1afdc51
    • J
      tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy · b701a99e
      Jon Maxwell 提交于
      Create the tcp_clamp_rto_to_user_timeout() helper routine. To calculate
      the correct rto, so that the TCP_USER_TIMEOUT socket option is more
      accurate. Taking suggestions and feedback into account from
      Eric Dumazet, Neal Cardwell and David Laight. Due to the 1st commit we
      can avoid the msecs_to_jiffies() and jiffies_to_msecs() dance.
      Signed-off-by: NJon Maxwell <jmaxwell37@gmail.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b701a99e
    • J
      tcp: Add tcp_retransmit_stamp() helper routine · a7fa3770
      Jon Maxwell 提交于
      Create a seperate helper routine as per Neal Cardwells suggestion. To
      be used by the final commit in this series and retransmits_timed_out().
      Signed-off-by: NJon Maxwell <jmaxwell37@gmail.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7fa3770
    • J
      tcp: convert icsk_user_timeout from jiffies to msecs · 9bcc66e1
      Jon Maxwell 提交于
      This is a preparatory commit. Part of this series that improves the
      socket TCP_USER_TIMEOUT option accuracy. Implement Eric Dumazets idea
      to convert icsk->icsk_user_timeout from jiffies to msecs. To eliminate
      the msecs_to_jiffies() and jiffies_to_msecs() dance in future.
      Signed-off-by: NJon Maxwell <jmaxwell37@gmail.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bcc66e1
    • D
      Merge branch 's390-qeth-updates' · 975cd350
      David S. Miller 提交于
      Julian Wiedmann says:
      
      ====================
      s390/qeth: updates 2018-07-19
      
      please apply one more round of qeth patches to net-next.
      This brings additional performance improvements for the transmit code,
      and some refactoring to pave the way for using netdev_priv.
      Also, two minor fixes for rare corner cases.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      975cd350
    • J
      s390/qeth: speed up L2 IQD xmit · 5f89eca5
      Julian Wiedmann 提交于
      Modify the L2 OSA xmit path so that it also supports L2 IQD devices
      (in particular, their HW header requirements). This allows IQD devices
      to advertise NETIF_F_SG support, and eliminates the allocation overhead
      for the HW header.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f89eca5
    • J
      s390/qeth: add support for constrained HW headers · a7c2f4a3
      Julian Wiedmann 提交于
      Some transmit modes require that the HW header is located in the same
      page as the initial protocol headers in skb->data. Let callers specify
      the size of this contiguous header range, and enforce it when building
      the HW header.
      
      While at it, apply some gentle renaming to the relevant L2 code so that
      it matches the L3 code.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7c2f4a3
    • J
      s390/qeth: merge linearize-check into HW header construction · ba86ceee
      Julian Wiedmann 提交于
      When checking whether an skb needs to be linearized to fit into an IO
      buffer, it's desirable to consider the skb's final size and layout
      (ie. after the HW header was added). But a subsequent linearization can
      then cause the re-positioned HW header to violate its alignment
      restrictions.
      
      Dealing with this situation in two different code paths is quite tricky.
      This patch integrates a) linearize-check and b) HW header construction
      into one 3 step-sequence:
      1. evaluate how the HW header needs to be added (to identify if it takes
         up an additional buffer element), then
      2. check if the required buffer elements exceed the device's limit.
         Linearize when necessary and re-evaluate the HW header placement.
      3. Add the HW header in the best-possible way:
         a) push, without taking up an additional buffer element
         b) push, but consume another buffer element
         c) allocate a header object from the cache.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba86ceee
    • J
      s390/qeth: add statistics for consumed buffer elements · d2a274b2
      Julian Wiedmann 提交于
      Nowadays an skb fragment typically spans over multiple pages. So replace
      the obsolete, SG-only 'fragments' counter with one that tracks the
      consumed buffer elements. This is what actually matters for performance.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2a274b2
    • J
      s390/qeth: use core MTU range checking · 72f219da
      Julian Wiedmann 提交于
      qeth's ndo_change_mtu() only applies some trivial bounds checking. Set
      up dev->min_mtu properly, so that dev_set_mtu() can do this for us.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72f219da
    • J
      s390/qeth: simplify max MTU handling · 8ce7a9e0
      Julian Wiedmann 提交于
      When the MPC initialization code discovers the HW-specific max MTU,
      apply the resulting changes straight to the netdevice.
      
      If this is the device's first initialization, also set its MTU
      (HiperSockets: the max MTU; else: a layer-specific default value).
      Then cap the current MTU by the new max MTU.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ce7a9e0
    • J
      s390/qeth: don't cache HW port number · 92d27209
      Julian Wiedmann 提交于
      The netdevice is always available now, so get the portno from there.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92d27209
    • J
      s390/qeth: allocate netdevice early · d3d1b205
      Julian Wiedmann 提交于
      Allocation of the netdevice is currently delayed until a qeth card first
      goes online. This complicates matters in several places, where we need
      to cache values instead of applying them straight to the netdevice.
      
      Improve on this by moving the allocation up to where the qeth card
      itself is created. This is also one step in direction of eventually
      placing the qeth card into netdev_priv().
      
      In all subsequent code, remove the now redundant checks whether
      card->dev is valid.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3d1b205
    • J
      s390/qeth: remove redundant netif_carrier_ok() checks · addc5ee8
      Julian Wiedmann 提交于
      netif_carrier_off() does its own checking.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      addc5ee8
    • J
      s390/qeth: reset layer2 attribute on layer switch · 70551dc4
      Julian Wiedmann 提交于
      After the subdriver's remove() routine has completed, the card's layer
      mode is undetermined again. Reflect this in the layer2 field.
      
      If qeth_dev_layer2_store() hits an error after remove() was called, the
      card _always_ requires a setup(), even if the previous layer mode is
      requested again.
      But qeth_dev_layer2_store() bails out early if the requested layer mode
      still matches the current one. So unless we reset the layer2 field,
      re-probing the card back to its previous mode is currently not possible.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70551dc4
    • J
      s390/qeth: fix race in used-buffer accounting · a702349a
      Julian Wiedmann 提交于
      By updating q->used_buffers only _after_ do_QDIO() has completed, there
      is a potential race against the buffer's TX completion. In the unlikely
      case that the TX completion path wins, qeth_qdio_output_handler() would
      decrement the counter before qeth_flush_buffers() even incremented it.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a702349a
  2. 21 7月, 2018 24 次提交