1. 26 9月, 2018 6 次提交
  2. 25 9月, 2018 13 次提交
  3. 24 9月, 2018 1 次提交
    • P
      mlxsw: Make MLXSW_SP1_FWREV_MINOR a hard requirement · 12ba7e10
      Petr Machata 提交于
      Up until now, mlxsw tolerated firmware versions that weren't exactly
      matching the required version, if the branch number matched. That
      allowed the users to test various firmware versions as long as they were
      on the right branch.
      
      On the other hand, it made it impossible for mlxsw to put a hard lower
      bound on a version that fixes all problems known to date. If a user had
      a somewhat older FW version installed, mlxsw would start up just fine,
      possibly performing non-optimally as it would use features that trigger
      problematic behavior.
      
      Therefore tweak the check to accept any FW version that is:
      
      - on the same branch as the preferred version, and
      - the same as or newer than the preferred version.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12ba7e10
  4. 23 9月, 2018 4 次提交
  5. 22 9月, 2018 16 次提交
    • D
      Merge branch 'net-dsa-b53-SGMII-modes-fixes' · bd4d08da
      David S. Miller 提交于
      Florian Fainelli says:
      
      ====================
      net: dsa: b53: SGMII modes fixes
      
      Here are two additional fixes that are required in order for SGMII to
      work correctly. This was discovered with using a copper SFP which would
      make us use SGMII mode, we would actually leave the HW configured in its
      default mode: Fiber.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd4d08da
    • F
      net: dsa: b53: Also include SGMII for mac_config and mac_link_state · 55a4d2ea
      Florian Fainelli 提交于
      In both 802.3z and SGMII modes we need to configure the MAC accordingly
      to flip between Fiber and SGMII modes, and we need to read the MAC
      status from the SGMII in-band control word.
      
      Fixes: 0e01491d ("net: dsa: b53: Add SerDes support")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55a4d2ea
    • F
      net: dsa: b53: Fix B53_SERDES_DIGITAL_CONTROL offset · 2cae8c07
      Florian Fainelli 提交于
      Maths went wrong, to get 0x20, we need to do 0x1e + (x) * 2, not 0x18,
      fix that offset so we access the correct registers. This would make us
      not access the correct SerDes Digital control words, status would be
      fine and so we would not be correctly flipping between Fiber and SGMII
      modes resulting in incorrect status words being pulled into the SerDes
      digital status register.
      
      Fixes: 0e01491d ("net: dsa: b53: Add SerDes support")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2cae8c07
    • F
      net: dsa: b53: Don't assign autonegotiation enabled · e24cf6b3
      Florian Fainelli 提交于
      PHYLINK takes care of filing the right information into
      state->an_enabled, get rid of the read from the SerDes's BMCR register.
      
      Fixes: 0e01491d ("net: dsa: b53: Add SerDes support")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e24cf6b3
    • N
      decnet: Remove unnecessary check for dev->name · 5b9b0a80
      Nathan Chancellor 提交于
      Clang warns that the address of a pointer will always evaluated as true
      in a boolean context.
      
      net/decnet/dn_dev.c:1366:10: warning: address of array 'dev->name' will
      always evaluate to 'true' [-Wpointer-bool-conversion]
                                      dev->name ? dev->name : "???",
                                      ~~~~~^~~~ ~
      1 warning generated.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/116Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b9b0a80
    • P
      selftests/net: add ipv6 tests to ip_defrag selftest · bccc1711
      Peter Oskolkov 提交于
      This patch adds ipv6 defragmentation tests to ip_defrag selftest,
      to complement existing ipv4 tests.
      Signed-off-by: NPeter Oskolkov <posk@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bccc1711
    • P
      net/ipfrag: let ip[6]frag_high_thresh in ns be higher than in init_net · 83619623
      Peter Oskolkov 提交于
      Currently, ip[6]frag_high_thresh sysctl values in new namespaces are
      hard-limited to those of the root/init ns.
      
      There are at least two use cases when it would be desirable to
      set the high_thresh values higher in a child namespace vs the global hard
      limit:
      
      - a security/ddos protection policy may lower the thresholds in the
        root/init ns but allow for a special exception in a child namespace
      - testing: a test running in a namespace may want to set these
        thresholds higher in its namespace than what is in the root/init ns
      
      The new behavior:
      
       # ip netns add testns
       # ip netns exec testns bash
      
       # sysctl -w net.ipv4.ipfrag_high_thresh=9000000
       net.ipv4.ipfrag_high_thresh = 9000000
      
       # sysctl net.ipv4.ipfrag_high_thresh
       net.ipv4.ipfrag_high_thresh = 9000000
      
       # sysctl -w net.ipv6.ip6frag_high_thresh=9000000
       net.ipv6.ip6frag_high_thresh = 9000000
      
       # sysctl net.ipv6.ip6frag_high_thresh
       net.ipv6.ip6frag_high_thresh = 9000000
      
      The old behavior:
      
       # ip netns add testns
       # ip netns exec testns bash
      
       # sysctl -w net.ipv4.ipfrag_high_thresh=9000000
       net.ipv4.ipfrag_high_thresh = 9000000
      
       # sysctl net.ipv4.ipfrag_high_thresh
       net.ipv4.ipfrag_high_thresh = 4194304
      
       # sysctl -w net.ipv6.ip6frag_high_thresh=9000000
       net.ipv6.ip6frag_high_thresh = 9000000
      
       # sysctl net.ipv6.ip6frag_high_thresh
       net.ipv6.ip6frag_high_thresh = 4194304
      Signed-off-by: NPeter Oskolkov <posk@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83619623
    • P
      ipv6: discard IP frag queue on more errors · 2475f59c
      Peter Oskolkov 提交于
      This is similar to how ipv4 now behaves:
      commit 0ff89efb ("ip: fail fast on IP defrag errors").
      Signed-off-by: NPeter Oskolkov <posk@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2475f59c
    • E
      net/ipv4: avoid compile error in fib_info_nh_uses_dev · 075e264f
      Eric Dumazet 提交于
      net/ipv4/fib_frontend.c: In function 'fib_info_nh_uses_dev':
      net/ipv4/fib_frontend.c:322:6: error: unused variable 'ret' [-Werror=unused-variable]
      cc1: all warnings being treated as errors
      
      Fixes: 78f2756c ("net/ipv4: Move device validation to helper")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      075e264f
    • D
      Merge branch 'tcp-switch-to-Early-Departure-Time-model' · a88e24f2
      David S. Miller 提交于
      Eric Dumazet says:
      
      ====================
      tcp: switch to Early Departure Time model
      
      In the early days, pacing has been implemented in sch_fq (FQ)
      in a generic way :
      
      - SO_MAX_PACING_RATE could be used by any sockets.
      
      - TCP would vary effective pacing rate based on CWND*MSS/SRTT
      
      - FQ would ensure delays between packets based on current
        sk->sk_pacing_rate, but with some quantum based artifacts.
        (inflating RPC tail latencies)
      
      - BBR then tweaked the pacing rate in its various phases
        (PROBE, DRAIN, ...)
      
      This worked reasonably well, but had the side effect that TCP RTT
      samples would be inflated by the sojourn time of the packets in FQ.
      
      Also note that when FQ is not used and TCP wants pacing, the
      internal pacing fallback has very different behavior, since TCP
      emits packets at the time they should be sent (with unreasonable
      assumptions about scheduling costs)
      
      Van Jacobson gave a talk at Netdev 0x12 in Montreal, about letting
      TCP (or applications for UDP messages) decide of the Earliest
      Departure Time, instead of letting packet schedulers derive it
      from pacing rate.
      
      https://www.netdevconf.org/0x12/session.html?evolving-from-afap-teaching-nics-about-time
      https://www.files.netdevconf.org/d/46def75c2ef345809bbe/files/?p=/Evolving%20from%20AFAP%20%E2%80%93%20Teaching%20NICs%20about%20time.pdf
      
      Recent additions in linux provided SO_TXTIME and a new ETF qdisc
      supporting the new skb->tstamp role
      
      This patch series converts TCP and FQ to the same model.
      
      This might in the future allow us to relax tight TSQ limits
      (if FQ is present in the output path), and thus lower
      number of callbacks to tcp_write_xmit(), thanks to batching.
      
      This will be followed by FQ change allowing SO_TXTIME support
      so that QUIC servers can let the pacing being done in FQ (or
      offloaded if network device permits)
      
      For example, a TCP flow rated at 24Mbps now shows a more meaningful RTT
      
      Before :
      
      ESTAB  0  211408 10.246.7.151:41558   10.246.7.152:33723
      	 cubic wscale:8,8 rto:203 rtt:2.195/0.084 mss:1448 rcvmss:536
        advmss:1448 cwnd:20 ssthresh:20 bytes_acked:36897937
        segs_out:25488 segs_in:12454 data_segs_out:25486
        send 105.5Mbps lastsnd:1 lastrcv:12851 lastack:1
        pacing_rate 24.0Mbps/24.0Mbps delivery_rate 22.9Mbps
        busy:12851ms unacked:4 rcv_space:29200 notsent:205616 minrtt:0.026
      
      After :
      
      ESTAB  0  192584 10.246.7.151:61612   10.246.7.152:34375
      	 cubic wscale:8,8 rto:201 rtt:0.165/0.129 mss:1448 rcvmss:536
        advmss:1448 cwnd:20 ssthresh:20 bytes_acked:170755401
        segs_out:117931 segs_in:57651 data_segs_out:117929
        send 1404.1Mbps lastsnd:1 lastrcv:56915 lastack:1
        pacing_rate 24.0Mbps/24.0Mbps delivery_rate 24.2Mbps
        busy:56915ms unacked:4 rcv_space:29200 notsent:186792 minrtt:0.054
      
      A nice side effect of this patch series is a reduction of max/p99
      latencies of RPC workloads, since the FQ quantum no longer adds
      artifact.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a88e24f2
    • E
      net_sched: sch_fq: remove dead code dealing with retransmits · 90caf67b
      Eric Dumazet 提交于
      With the earliest departure time model, we no longer plan
      special casing TCP retransmits. We therefore remove dead
      code (since most compilers understood skb_is_retransmit()
      was false)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90caf67b
    • E
      tcp: switch tcp_internal_pacing() to tcp_wstamp_ns · c092dd5f
      Eric Dumazet 提交于
      Now TCP keeps track of tcp_wstamp_ns, recording the earliest
      departure time of next packet, we can remove duplicate code
      from tcp_internal_pacing()
      
      This removes one ktime_get_tai_ns() call, and a divide.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c092dd5f
    • E
      tcp: switch tcp and sch_fq to new earliest departure time model · ab408b6d
      Eric Dumazet 提交于
      TCP keeps track of tcp_wstamp_ns by itself, meaning sch_fq
      no longer has to do it.
      
      Thanks to this model, TCP can get more accurate RTT samples,
      since pacing no longer inflates them.
      
      This has the nice effect of removing some delays caused by FQ
      quantum mechanism, causing inflated max/P99 latencies.
      
      Also we might relax TCP Small Queue tight limits in the future,
      since this new model allow TCP to build bigger batches, since
      sch_fq (or a device with earliest departure time offload) ensure
      these packets will be delivered on time.
      
      Note that other protocols are not converted (they will probably
      never be) so sch_fq has still support for SO_MAX_PACING_RATE
      
      Tested:
      
      Test showing FQ pacing quantum artifact for low-rate flows,
      adding unexpected throttles for RPC flows, inflating max and P99 latencies.
      
      The parameters chosen here are to show what happens typically when
      a TCP flow has a reduced pacing rate (this can be caused by a reduced
      cwin after few losses, or/and rtt above few ms)
      
      MIBS="MIN_LATENCY,MEAN_LATENCY,MAX_LATENCY,P99_LATENCY,STDDEV_LATENCY"
      Before :
      $ netperf -H 10.246.7.133 -t TCP_RR -Cc -T6,6 -- -q 2000000 -r 100,100 -o $MIBS
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.133 () port 0 AF_INET : first burst 0 : cpu bind
       Minimum Latency Microseconds,Mean Latency Microseconds,Maximum Latency Microseconds,99th Percentile Latency Microseconds,Stddev Latency Microseconds
      19,82.78,5279,3825,482.02
      
      After :
      $ netperf -H 10.246.7.133 -t TCP_RR -Cc -T6,6 -- -q 2000000 -r 100,100 -o $MIBS
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.133 () port 0 AF_INET : first burst 0 : cpu bind
      Minimum Latency Microseconds,Mean Latency Microseconds,Maximum Latency Microseconds,99th Percentile Latency Microseconds,Stddev Latency Microseconds
      20,49.94,128,63,3.18
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab408b6d
    • E
      tcp: switch internal pacing timer to CLOCK_TAI · fd2bca2a
      Eric Dumazet 提交于
      Next patch will use tcp_wstamp_ns to feed internal
      TCP pacing timer, so switch to CLOCK_TAI to share same base.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd2bca2a
    • E
      tcp: provide earliest departure time in skb->tstamp · d3edd06e
      Eric Dumazet 提交于
      Switch internal TCP skb->skb_mstamp to skb->skb_mstamp_ns,
      from usec units to nsec units.
      
      Do not clear skb->tstamp before entering IP stacks in TX,
      so that qdisc or devices can implement pacing based on the
      earliest departure time instead of socket sk->sk_pacing_rate
      
      Packets are fed with tcp_wstamp_ns, and following patch
      will update tcp_wstamp_ns when both TCP and sch_fq switch to
      the earliest departure time mechanism.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3edd06e
    • E
      tcp: add tcp_wstamp_ns socket field · 9799ccb0
      Eric Dumazet 提交于
      TCP will soon provide earliest departure time on TX skbs.
      It needs to track this in a new variable.
      
      tcp_mstamp_refresh() needs to update this variable, and
      became too big to stay an inline.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9799ccb0