1. 16 10月, 2018 7 次提交
    • E
      net: extend sk_pacing_rate to unsigned long · 76a9ebe8
      Eric Dumazet 提交于
      sk_pacing_rate has beed introduced as a u32 field in 2013,
      effectively limiting per flow pacing to 34Gbit.
      
      We believe it is time to allow TCP to pace high speed flows
      on 64bit hosts, as we now can reach 100Gbit on one TCP flow.
      
      This patch adds no cost for 32bit kernels.
      
      The tcpi_pacing_rate and tcpi_max_pacing_rate were already
      exported as 64bit, so iproute2/ss command require no changes.
      
      Unfortunately the SO_MAX_PACING_RATE socket option will stay
      32bit and we will need to add a new option to let applications
      control high pacing rates.
      
      State      Recv-Q Send-Q Local Address:Port             Peer Address:Port
      ESTAB      0      1787144  10.246.9.76:49992             10.246.9.77:36741
                       timer:(on,003ms,0) ino:91863 sk:2 <->
       skmem:(r0,rb540000,t66440,tb2363904,f605944,w1822984,o0,bl0,d0)
       ts sack bbr wscale:8,8 rto:201 rtt:0.057/0.006 mss:1448
       rcvmss:536 advmss:1448
       cwnd:138 ssthresh:178 bytes_acked:256699822585 segs_out:177279177
       segs_in:3916318 data_segs_out:177279175
       bbr:(bw:31276.8Mbps,mrtt:0,pacing_gain:1.25,cwnd_gain:2)
       send 28045.5Mbps lastrcv:73333
       pacing_rate 38705.0Mbps delivery_rate 22997.6Mbps
       busy:73333ms unacked:135 retrans:0/157 rcv_space:14480
       notsent:2085120 minrtt:0.013
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76a9ebe8
    • E
      tcp: do not change tcp_wstamp_ns in tcp_mstamp_refresh · 5f6188a8
      Eric Dumazet 提交于
      In EDT design, I made the mistake of using tcp_wstamp_ns
      to store the last tcp_clock_ns() sample and to store the
      pacing virtual timer.
      
      This causes major regressions at high speed flows.
      
      Introduce tcp_clock_cache to store last tcp_clock_ns().
      This is needed because some arches have slow high-resolution
      kernel time service.
      
      tcp_wstamp_ns is only updated when a packet is sent.
      
      Note that we can remove tcp_mstamp in the future since
      tcp_mstamp is essentially tcp_clock_cache/1000, so the
      apparent socket size increase is temporary.
      
      Fixes: 9799ccb0 ("tcp: add tcp_wstamp_ns socket field")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f6188a8
    • L
      net: bridge: fix a possible memory leak in __vlan_add · 1a3aea25
      Li RongQing 提交于
      After per-port vlan stats, vlan stats should be released
      when fail to add vlan
      
      Fixes: 9163a0fc ("net: bridge: add support for per-port vlan stats")
      CC: bridge@lists.linux-foundation.org
      cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NZhang Yu <zhangyu31@baidu.com>
      Signed-off-by: NLi RongQing <lirongqing@baidu.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a3aea25
    • D
      rxrpc: Add /proc/net/rxrpc/peers to display peer list · bc0e7cf4
      David Howells 提交于
      Add /proc/net/rxrpc/peers to display the list of peers currently active.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc0e7cf4
    • J
      net/ncsi: Extend NC-SI Netlink interface to allow user space to send NC-SI command · 9771b8cc
      Justin.Lee1@Dell.com 提交于
      The new command (NCSI_CMD_SEND_CMD) is added to allow user space application
      to send NC-SI command to the network card.
      Also, add a new attribute (NCSI_ATTR_DATA) for transferring request and response.
      
      The work flow is as below.
      
      Request:
      User space application
      	-> Netlink interface (msg)
      	-> new Netlink handler - ncsi_send_cmd_nl()
      	-> ncsi_xmit_cmd()
      
      Response:
      Response received - ncsi_rcv_rsp()
      	-> internal response handler - ncsi_rsp_handler_xxx()
      	-> ncsi_rsp_handler_netlink()
      	-> ncsi_send_netlink_rsp ()
      	-> Netlink interface (msg)
      	-> user space application
      
      Command timeout - ncsi_request_timeout()
      	-> ncsi_send_netlink_timeout ()
      	-> Netlink interface (msg with zero data length)
      	-> user space application
      
      Error:
      Error detected
      	-> ncsi_send_netlink_err ()
      	-> Netlink interface (err msg)
      	-> user space application
      Signed-off-by: NJustin Lee <justin.lee1@dell.com>
      Reviewed-by: NSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9771b8cc
    • H
      tipc: support binding to specific ip address when activating UDP bearer · acad76a5
      Hoang Le 提交于
      INADDR_ANY is hard-coded when activating UDP bearer. So, we could not
      bind to a specific IP address even with replicast mode using - given
      remote ip address instead of using multicast ip address.
      
      In this commit, we fixed it by checking and switch to use appropriate
      local ip address.
      
      before:
      $netstat -plu
      Active Internet connections (only servers)
      Proto Recv-Q Send-Q Local Address           Foreign Address
      udp        0      0 **0.0.0.0:6118**            0.0.0.0:*
      
      after:
      $netstat -plu
      Active Internet connections (only servers)
      Proto Recv-Q Send-Q Local Address           Foreign Address
      udp        0      0 **10.0.0.2:6118**           0.0.0.0:*
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      acad76a5
    • M
      FDDI: defza: Support capturing outgoing SMT traffic · 9f9a742d
      Maciej W. Rozycki 提交于
      DEC FDDIcontroller 700 (DEFZA) uses a Tx/Rx queue pair to communicate
      SMT frames with adapter's firmware.  Any SMT frame received from the RMC
      via the Rx queue is queued back by the driver to the SMT Rx queue for
      the firmware to process.  Similarly the firmware uses the SMT Tx queue
      to supply the driver with SMT frames which are queued back to the Tx
      queue for the RMC to send to the ring.
      
      When a network tap is attached to an FDDI interface handled by `defza'
      any incoming SMT frames captured are queued to our usual processing of
      network data received, which in turn delivers them to any listening
      taps.
      
      However the outgoing SMT frames produced by the firmware bypass our
      network protocol stack and are therefore not delivered to taps.  This in
      turn means that taps are missing a part of network traffic sent by the
      adapter, which may make it more difficult to track down network problems
      or do general traffic analysis.
      
      Call `dev_queue_xmit_nit' then in the SMT Tx path, having checked that
      a network tap is attached, with a newly-created `dev_nit_active' helper
      wrapping the usual condition used in the transmit path.
      Signed-off-by: NMaciej W. Rozycki <macro@linux-mips.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f9a742d
  2. 13 10月, 2018 3 次提交
    • N
      net: bridge: add support for per-port vlan stats · 9163a0fc
      Nikolay Aleksandrov 提交于
      This patch adds an option to have per-port vlan stats instead of the
      default global stats. The option can be set only when there are no port
      vlans in the bridge since we need to allocate the stats if it is set
      when vlans are being added to ports (and respectively free them
      when being deleted). Also bump RTNL_MAX_TYPE as the bridge is the
      largest user of options. The current stats design allows us to add
      these without any changes to the fast-path, it all comes down to
      the per-vlan stats pointer which, if this option is enabled, will
      be allocated for each port vlan instead of using the global bridge-wide
      one.
      
      CC: bridge@lists.linux-foundation.org
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9163a0fc
    • D
      net: Evict neighbor entries on carrier down · 859bd2ef
      David Ahern 提交于
      When a link's carrier goes down it could be a sign of the port changing
      networks. If the new network has overlapping addresses with the old one,
      then the kernel will continue trying to use neighbor entries established
      based on the old network until the entries finally age out - meaning a
      potentially long delay with communications not working.
      
      This patch evicts neighbor entries on carrier down with the exception of
      those marked permanent. Permanent entries are managed by userspace (either
      an admin or a routing daemon such as FRR).
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      859bd2ef
    • D
      net/ipv6: Add knob to skip DELROUTE message on device down · 7c6bb7d2
      David Ahern 提交于
      Another difference between IPv4 and IPv6 is the generation of RTM_DELROUTE
      notifications when a device is taken down (admin down) or deleted. IPv4
      does not generate a message for routes evicted by the down or delete;
      IPv6 does. A NOS at scale really needs to avoid these messages and have
      IPv4 and IPv6 behave similarly, relying on userspace to handle link
      notifications and evict the routes.
      
      At this point existing user behavior needs to be preserved. Since
      notifications are a global action (not per app) the only way to preserve
      existing behavior and allow the messages to be skipped is to add a new
      sysctl (net/ipv6/route/skip_notify_on_dev_down) which can be set to
      disable the notificatioons.
      
      IPv6 route code already supports the option to skip the message (it is
      used for multipath routes for example). Besides the new sysctl we need
      to pass the skip_notify setting through the generic fib6_clean and
      fib6_walk functions to fib6_clean_node and to set skip_notify on calls
      to __ip_del_rt for the addrconf_ifdown path.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c6bb7d2
  3. 12 10月, 2018 4 次提交
    • A
      mac80211: implement ieee80211_tx_rate_update to update rate · f8252e7b
      Anilkumar Kolli 提交于
      Current mac80211 has provision to update tx status through
      ieee80211_tx_status() and ieee80211_tx_status_ext(). But
      drivers like ath10k updates the tx status from the skb except
      txrate, txrate will be updated from a different path, peer stats.
      
      Using ieee80211_tx_status_ext() in two different paths
      (one for the stats, one for the tx rate) would duplicate
      the stats instead.
      
      To avoid this stats duplication, ieee80211_tx_rate_update()
      is implemented.
      Signed-off-by: NAnilkumar Kolli <akolli@codeaurora.org>
      [minor commit message editing, use initializers in code]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      f8252e7b
    • A
      nl80211: Add per peer statistics to compute FCS error rate · 0d4e14a3
      Ankita Bajaj 提交于
      Add support for drivers to report the total number of MPDUs received
      and the number of MPDUs received with an FCS error from a specific
      peer. These counters will be incremented only when the TA of the
      frame matches the MAC address of the peer irrespective of FCS
      error.
      
      It should be noted that the TA field in the frame might be corrupted
      when there is an FCS error and TA matching logic would fail in such
      cases. Hence, FCS error counter might not be fully accurate, but it can
      provide help in detecting bad RX links in significant number of cases.
      This FCS error counter without full accuracy can be used, e.g., to
      trigger a kick-out of a connected client with a bad link in AP mode to
      force such a client to roam to another AP.
      Signed-off-by: NAnkita Bajaj <bankita@codeaurora.org>
      Signed-off-by: NJouni Malinen <jouni@codeaurora.org>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      0d4e14a3
    • P
      mac80211: support FTM responder configuration/statistics · bc847970
      Pradeep Kumar Chitrapu 提交于
      New bss param ftm_responder is used to notify the driver to
      enable fine timing request (FTM) responder role in AP mode.
      
      Plumb the new cfg80211 API for FTM responder statistics through to
      the driver API in mac80211.
      Signed-off-by: NDavid Spinadel <david.spinadel@intel.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NPradeep Kumar Chitrapu <pradeepc@codeaurora.org>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      bc847970
    • Y
      tipc: eliminate possible recursive locking detected by LOCKDEP · a1f8dd34
      Ying Xue 提交于
      When booting kernel with LOCKDEP option, below warning info was found:
      
      WARNING: possible recursive locking detected
      4.19.0-rc7+ #14 Not tainted
      --------------------------------------------
      swapper/0/1 is trying to acquire lock:
      00000000dcfc0fc8 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh
      include/linux/spinlock.h:334 [inline]
      00000000dcfc0fc8 (&(&list->lock)->rlock#4){+...}, at:
      tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850
      
      but task is already holding lock:
      00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh
      include/linux/spinlock.h:334 [inline]
      00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at:
      tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&list->lock)->rlock#4);
        lock(&(&list->lock)->rlock#4);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by swapper/0/1:
       #0: 00000000f7539d34 (pernet_ops_rwsem){+.+.}, at:
      register_pernet_subsys+0x19/0x40 net/core/net_namespace.c:1051
       #1: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at:
      spin_lock_bh include/linux/spinlock.h:334 [inline]
       #1: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at:
      tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849
      
      stack backtrace:
      CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc7+ #14
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1af/0x295 lib/dump_stack.c:113
       print_deadlock_bug kernel/locking/lockdep.c:1759 [inline]
       check_deadlock kernel/locking/lockdep.c:1803 [inline]
       validate_chain kernel/locking/lockdep.c:2399 [inline]
       __lock_acquire+0xf1e/0x3c60 kernel/locking/lockdep.c:3411
       lock_acquire+0x1db/0x520 kernel/locking/lockdep.c:3900
       __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
       _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
       spin_lock_bh include/linux/spinlock.h:334 [inline]
       tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850
       tipc_link_bc_create+0xb5/0x1f0 net/tipc/link.c:526
       tipc_bcast_init+0x59b/0xab0 net/tipc/bcast.c:521
       tipc_init_net+0x472/0x610 net/tipc/core.c:82
       ops_init+0xf7/0x520 net/core/net_namespace.c:129
       __register_pernet_operations net/core/net_namespace.c:940 [inline]
       register_pernet_operations+0x453/0xac0 net/core/net_namespace.c:1011
       register_pernet_subsys+0x28/0x40 net/core/net_namespace.c:1052
       tipc_init+0x83/0x104 net/tipc/core.c:140
       do_one_initcall+0x109/0x70a init/main.c:885
       do_initcall_level init/main.c:953 [inline]
       do_initcalls init/main.c:961 [inline]
       do_basic_setup init/main.c:979 [inline]
       kernel_init_freeable+0x4bd/0x57f init/main.c:1144
       kernel_init+0x13/0x180 init/main.c:1063
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413
      
      The reason why the noise above was complained by LOCKDEP is because we
      nested to hold l->wakeupq.lock and l->inputq->lock in tipc_link_reset
      function. In fact it's unnecessary to move skb buffer from l->wakeupq
      queue to l->inputq queue while holding the two locks at the same time.
      Instead, we can move skb buffers in l->wakeupq queue to a temporary
      list first and then move the buffers of the temporary list to l->inputq
      queue, which is also safe for us.
      
      Fixes: 3f32d0be ("tipc: lock wakeup & inputq at tipc_link_reset()")
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1f8dd34
  4. 11 10月, 2018 26 次提交