1. 06 12月, 2018 1 次提交
  2. 05 12月, 2018 10 次提交
    • E
      tcp: reduce POLLOUT events caused by TCP_NOTSENT_LOWAT · a74f0fa0
      Eric Dumazet 提交于
      TCP_NOTSENT_LOWAT socket option or sysctl was added in linux-3.12
      as a step to enable bigger tcp sndbuf limits.
      
      It works reasonably well, but the following happens :
      
      Once the limit is reached, TCP stack generates
      an [E]POLLOUT event for every incoming ACK packet.
      
      This causes a high number of context switches.
      
      This patch implements the strategy David Miller added
      in sock_def_write_space() :
      
       - If TCP socket has a notsent_lowat constraint of X bytes,
         allow sendmsg() to fill up to X bytes, but send [E]POLLOUT
         only if number of notsent bytes is below X/2
      
      This considerably reduces TCP_NOTSENT_LOWAT overhead,
      while allowing to keep the pipe full.
      
      Tested:
       100 ms RTT netem testbed between A and B, 100 concurrent TCP_STREAM
      
      A:/# cat /proc/sys/net/ipv4/tcp_wmem
      4096	262144	64000000
      A:/# super_netperf 100 -H B -l 1000 -- -K bbr &
      
      A:/# grep TCP /proc/net/sockstat
      TCP: inuse 203 orphan 0 tw 19 alloc 414 mem 1364904 # This is about 54 MB of memory per flow :/
      
      A:/# vmstat 5 5
      procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
       r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
       0  0      0 256220672  13532 694976    0    0    10     0   28   14  0  1 99  0  0
       2  0      0 256320016  13532 698480    0    0   512     0 715901 5927  0 10 90  0  0
       0  0      0 256197232  13532 700992    0    0   735    13 771161 5849  0 11 89  0  0
       1  0      0 256233824  13532 703320    0    0   512    23 719650 6635  0 11 89  0  0
       2  0      0 256226880  13532 705780    0    0   642     4 775650 6009  0 12 88  0  0
      
      A:/# echo 2097152 >/proc/sys/net/ipv4/tcp_notsent_lowat
      
      A:/# grep TCP /proc/net/sockstat
      TCP: inuse 203 orphan 0 tw 19 alloc 414 mem 86411 # 3.5 MB per flow
      
      A:/# vmstat 5 5  # check that context switches have not inflated too much.
      procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
       r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
       2  0      0 260386512  13592 662148    0    0    10     0   17   14  0  1 99  0  0
       0  0      0 260519680  13592 604184    0    0   512    13 726843 12424  0 10 90  0  0
       1  1      0 260435424  13592 598360    0    0   512    25 764645 12925  0 10 90  0  0
       1  0      0 260855392  13592 578380    0    0   512     7 722943 13624  0 11 88  0  0
       1  0      0 260445008  13592 601176    0    0   614    34 772288 14317  0 10 90  0  0
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a74f0fa0
    • D
      Merge branch 'act_tunnel_key-support-key-less-tunnels' · 4dc88ce6
      David S. Miller 提交于
      Or Gerlitz says:
      
      ====================
      net/sched: act_tunnel_key: support key-less tunnels
      
      This short series from Adi Nissim allows to support key-less tunnels
      by the tc tunnel key actions, which is needed for some GRE use-cases.
      
      changes from V0:
       - addresses build warning spotted by kbuild, make sure to always init
         to zero the tunnel key
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4dc88ce6
    • A
      net/sched: act_tunnel_key: Don't dump dst port if it wasn't set · 1c25324c
      Adi Nissim 提交于
      It's possible to set a tunnel without a destination port. However,
      on dump(), a zero dst port is returned to user space even if it was not
      set, fix that.
      
      Note that so far it wasn't required, b/c key less tunnels were not
      supported and the UDP tunnels do require destination port.
      Signed-off-by: NAdi Nissim <adin@mellanox.com>
      Reviewed-by: NOz Shlomo <ozsh@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c25324c
    • A
      net/sched: act_tunnel_key: Allow key-less tunnels · 80ef0f22
      Adi Nissim 提交于
      Allow setting a tunnel without a tunnel key. This is required for
      tunneling protocols, such as GRE, that define the key as an optional
      field.
      Signed-off-by: NAdi Nissim <adin@mellanox.com>
      Acked-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: NOz Shlomo <ozsh@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80ef0f22
    • C
      qed: fix spelling mistake "Dispalying" -> "Displaying" · d1ecf8a6
      Colin Ian King 提交于
      There is a spelling mistake in a DP_NOTICE message, fix it.
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1ecf8a6
    • D
      Merge branch 'mlxsw-Add-one-armed-router-support' · 55827458
      David S. Miller 提交于
      Ido Schimmel says:
      
      ====================
      mlxsw: Add one-armed router support
      
      Up until now, when a packet was routed by the ASIC through the same
      router interface (RIF) from which it ingressed from, the ASIC passed the
      sole copy of the packet to the kernel. This allowed the kernel to route
      the packet and also potentially generate an ICMP redirect.
      
      There are scenarios (e.g., "one-armed router") where packets are
      intentionally routed this way and are therefore not deemed as
      exceptions. In such scenarios the current method of trapping packets to
      the CPU is problematic, as it results in major packet loss.
      
      This patchset solves the problem by having the ASIC forward the packet,
      but also send a copy to the CPU, which gives the kernel the opportunity
      to generate required exceptions.
      
      To prevent the kernel from forwarding such packets again, the driver
      marks them with 'offload_l3_fwd_mark', which causes the kernel to
      consume them in ip{,6}_forward_finish().
      
      Patch #1 renames 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark'. When
      set, the field indicates that a packet was already forwarded in L3
      (unicast / multicast) by a capable device.
      
      Patch #2 teaches the kernel to consume unicast packets that have
      'offload_l3_fwd_mark' set.
      
      Patch #3 changes mlxsw to mirror loopbacked (iRIF == eRIF) packets,
      instead of trapping them.
      
      Patch #4 adds a test case for above mentioned scenario.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55827458
    • I
      selftests: mlxsw: Add one-armed router test · b6f153d3
      Ido Schimmel 提交于
      Construct a "one-armed router" topology and test that packets are
      forwarded by the ASIC and that a copy of the packet is sent to the
      kernel, which does not forward the packet again.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6f153d3
    • I
      mlxsw: spectrum: Mirror loopbacked packets instead of trapping them · 2f4f4494
      Ido Schimmel 提交于
      When the ASIC detects that a unicast packet is routed through the same
      router interface (RIF) from which it ingressed (iRIF == eRIF), it raises
      a trap called loopback error (LBERROR).
      
      Thus far, this trap was configured to send a sole copy of the packet to
      the CPU so that ICMP redirect packets could be potentially generated by
      the kernel.
      
      This is problematic as the CPU cannot forward packets at 3.2Tb/s and
      there are scenarios (e.g., "one-armed router") where iRIF == eRIF is not
      an exception.
      
      Solve this by changing the trap to send a copy of the packet to the CPU.
      To prevent the kernel from forwarding the packet again, it is marked
      with 'offload_l3_fwd_mark'.
      
      The trap is configured in a trap group of its own with a dedicated
      policer in order not to prevent packets trapped by other traps from
      reaching the CPU.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f4f4494
    • I
      net: Do not route unicast IP packets twice · f839a6c9
      Ido Schimmel 提交于
      Packets marked with 'offload_l3_fwd_mark' were already forwarded by a
      capable device and should not be forwarded again by the kernel.
      Therefore, have the kernel consume them.
      
      The check is performed in ip{,6}_forward_finish() in order to allow the
      kernel to process such packets in ip{,6}_forward() and generate required
      exceptions. For example, ICMP redirects.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f839a6c9
    • I
      skbuff: Rename 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark' · 875e8939
      Ido Schimmel 提交于
      Commit abf4bb6b ("skbuff: Add the offload_mr_fwd_mark field") added
      the 'offload_mr_fwd_mark' field to indicate that a packet has already
      undergone L3 multicast routing by a capable device. The field is used to
      prevent the kernel from forwarding a packet through a netdev through
      which the device has already forwarded the packet.
      
      Currently, no unicast packet is routed by both the device and the
      kernel, but this is about to change by subsequent patches and we need to
      be able to mark such packets, so that they will no be forwarded twice.
      
      Instead of adding yet another field to 'struct sk_buff', we can just
      rename 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark', as a packet
      either has a multicast or a unicast destination IP.
      
      While at it, add a comment about both 'offload_fwd_mark' and
      'offload_l3_fwd_mark'.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      875e8939
  3. 04 12月, 2018 29 次提交