1. 25 4月, 2016 11 次提交
    • E
      ixgbevf: refactor ethtool stats handling · d72d6c19
      Emil Tantilov 提交于
      This brings the logic closer to how we handle the stats in ixgbe and it
      sets us up for introducing per-queue stats.
      
      Use IXGBEVF_STAT and IXGBEVF_NETDEV_STAT for accessing the driver and
      netdev stats respectively. This way we don't have to calculate the
      stats based on register values which could lead to the counters not
      being initialized properly when the interface is down.
      
      IXGBEVF_QUEUE_STATS_LEN is set to include the number of queues.
      
      Also some defines were renamed to use the IXGBEVF prefix.
      Signed-off-by: NEmil Tantilov <emil.s.tantilov@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d72d6c19
    • M
      ixgbe: Add register wait for slow links · 2f2219be
      Mark Rustad 提交于
      Use a new register to wait for previous register writes to complete
      before issuing a register read. This is needed when slower links
      are in use.
      Signed-off-by: NMark Rustad <mark.d.rustad@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      2f2219be
    • S
      ixgbe: make 'action' field in struct ixgbe_fdir_filter a u64 value · 2a9ed5d1
      Sridhar Samudrala 提交于
      This field is used to record the RX queue index for a redirect action
      passed via ring_cookie field in struct ethtool_rx_flow_spec which is
      a u64 value.
      
      For ex: after adding a filter rule to redirect to a VF using ethtool
        # echo 4 > /sys/class/net/p4p1/device/sriov_numvfs
        # ethtool -N p4p1 flow-type ip4 src-ip 192.168.0.1 action 0x100000000
      
      querying for the rule shows the Action as 'Direct to queue 0'
      
        # ethtool -n p4p1
        4 RX rings available
        Total 1 rules
      
        Filter: 2045
       	Rule Type: Raw IPv4
      	Src IP addr: 192.168.0.1 mask: 0.0.0.0
      	Dest IP addr: 0.0.0.0 mask: 255.255.255.255
      	TOS: 0x0 mask: 0xff
      	Protocol: 0 mask: 0xff
      	L4 bytes: 0x0 mask: 0xffffffff
      	VLAN EtherType: 0x0 mask: 0xffff
      	VLAN: 0x0 mask: 0xffff
      	User-defined: 0x0 mask: 0xffffffffffffffff
      	Action: Direct to queue 0
      
      With this fix, ethtool will report the right queue index even for VFs.
      	Action: Direct to queue 4294967296
      
      Here 4294967296 corresponds to 0x100000000.
      We need to update 'ethtool' to report the queue index as a Hex value so
      that it is more  user friendly and matches with the 'action' value that
      is passed when adding the rule.
      Signed-off-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      2a9ed5d1
    • E
      ixgbe: fix default mac->ops.setup_link for X550EM · 4695886c
      Emil Tantilov 提交于
      X550EM_a/x did not have a default value for mac->ops.setup_link which
      was causing link issues for backplane devices.
      
      This patch sets mac->ops.setup_link to ixgbe_setup_mac_link_X540 for
      X550EM_a/x which is also default for X550. This will result in
      mac->ops.setup_link calling the link setup function for the respective
      PHY type in case we do not need a special function to deal with it.
      Reported-by: NKen Cox <jkc@redhat.com>
      Signed-off-by: NEmil Tantilov <emil.s.tantilov@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      4695886c
    • E
      ixgbe: set VLAN spoof checking unconditionally · d3dec7c7
      Emil Tantilov 提交于
      Previously the PF driver would only set VLAN spoof checking if
      the VF had created VLANs. This was done by setting and checking
      a counter (vlan_count) whenever a VLAN was created by the VF.
      However it is possible for the vlan_count to be !=0 while there are
      no VLANs assigned to the VF due to the count incrementing every
      time a VLAN 0 is added on ifdown/up, which resulted in VLAN spoofing
      always being set for those VFs.
      
      This patch cleans up the logic by unconditionally setting VLAN based on
      how the VF is configured (via ip link set ethX vf Y spoofchk on/off).
      This change also resolves an issue where the VLAN spoofing can remain
      set even after being disabled by the user due to the driver enabling
      VLAN spoof checking every time a VLAN is added to the VF, but would
      only allow changes in the setting if vlan_count != 0.
      
      Also default_vf_vlan_id and vlans_enabled were removed from the
      vf_data_storage structure since they are not being used in the driver.
      Signed-off-by: NEmil Tantilov <emil.s.tantilov@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d3dec7c7
    • E
      ixgbe: consolidate the configuration of spoof checking · 77f192af
      Emil Tantilov 提交于
      Consolidate the logic behind configuring spoof checking:
      
      Move the setting of the MAC, VLAN and Ethertype spoof checking into
      ixgbe_ndo_set_vf_spoofchk().
      
      Change ixgbe_set_mac_anti_spoofing() to set MAC spoofing per VF similar
      to the VLAN and Ethertype functions - this allows us to call the helper
      functions in ixgbe_ndo_set_vf_spoofchk() for all spoof check types and
      only disable MAC spoof checking when creating MACVLAN.
      Signed-off-by: NEmil Tantilov <emil.s.tantilov@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      77f192af
    • E
      tcp-tso: do not split TSO packets at retransmit time · 10d3be56
      Eric Dumazet 提交于
      Linux TCP stack painfully segments all TSO/GSO packets before retransmits.
      
      This was fine back in the days when TSO/GSO were emerging, with their
      bugs, but we believe the dark age is over.
      
      Keeping big packets in write queues, but also in stack traversal
      has a lot of benefits.
       - Less memory overhead, because write queues have less skbs
       - Less cpu overhead at ACK processing.
       - Better SACK processing, as lot of studies mentioned how
         awful linux was at this ;)
       - Less cpu overhead to send the rtx packets
         (IP stack traversal, netfilter traversal, drivers...)
       - Better latencies in presence of losses.
       - Smaller spikes in fq like packet schedulers, as retransmits
         are not constrained by TCP Small Queues.
      
      1 % packet losses are common today, and at 100Gbit speeds, this
      translates to ~80,000 losses per second.
      Losses are often correlated, and we see many retransmit events
      leading to 1-MSS train of packets, at the time hosts are already
      under stress.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10d3be56
    • P
      tipc: fix stale links after re-enabling bearer · 8cee83dd
      Parthasarathy Bhuvaragan 提交于
      Commit 42b18f60 ("tipc: refactor function tipc_link_timeout()"),
      introduced a bug which prevents sending of probe messages during
      link synchronization phase. This leads to hanging links, if the
      bearer is disabled/enabled after links are up.
      
      In this commit, we send the probe messages correctly.
      
      Fixes: 42b18f60 ("tipc: refactor function tipc_link_timeout()")
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cee83dd
    • D
      Merge branch 'tcp-tcstamp_ack-frag-coalesce' · 6a74c196
      David S. Miller 提交于
      Martin KaFai Lau says:
      
      ====================
      tcp: Handle txstamp_ack when fragmenting/coalescing skbs
      
      This patchset is to handle the txstamp-ack bit when
      fragmenting/coalescing skbs.
      
      The second patch depends on the recently posted series
      for the net branch:
      "tcp: Merge timestamp info when coalescing skbs"
      
      A BPF prog is used to kprobe to sock_queue_err_skb()
      and print out the value of serr->ee.ee_data.  The BPF
      prog (run-able from bcc) is attached here:
      
      BPF prog used for testing:
      ~~~~~
      
      from __future__ import print_function
      from bcc import BPF
      
      bpf_text = """
      
      int trace_err_skb(struct pt_regs *ctx)
      {
      	struct sk_buff *skb = (struct sk_buff *)ctx->si;
      	struct sock *sk = (struct sock *)ctx->di;
      	struct sock_exterr_skb *serr;
      	u32 ee_data = 0;
      
      	if (!sk || !skb)
      		return 0;
      
      	serr = SKB_EXT_ERR(skb);
      	bpf_probe_read(&ee_data, sizeof(ee_data), &serr->ee.ee_data);
      	bpf_trace_printk("ee_data:%u\\n", ee_data);
      
      	return 0;
      };
      """
      
      b = BPF(text=bpf_text)
      b.attach_kprobe(event="sock_queue_err_skb", fn_name="trace_err_skb")
      print("Attached to kprobe")
      b.trace_print()
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a74c196
    • M
      tcp: Merge txstamp_ack in tcp_skb_collapse_tstamp · 2de8023e
      Martin KaFai Lau 提交于
      When collapsing skbs, txstamp_ack also needs to be merged.
      
      Retrans Collapse Test:
      ~~~~~~
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      0.200 write(4, ..., 730) = 730
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 730) = 730
      +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
      0.200 write(4, ..., 11680) = 11680
      
      0.200 > P. 1:731(730) ack 1
      0.200 > P. 731:1461(730) ack 1
      0.200 > . 1461:8761(7300) ack 1
      0.200 > P. 8761:13141(4380) ack 1
      
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop>
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop>
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop>
      0.300 > P. 1:1461(1460) ack 1
      0.400 < . 1:1(0) ack 13141 win 257
      
      BPF Output Before:
      ~~~~~
      <No output due to missing SCM_TSTAMP_ACK timestamp>
      
      BPF Output After:
      ~~~~~
      <...>-2027  [007] d.s.    79.765921: : ee_data:1459
      
      Sacks Collapse Test:
      ~~~~~
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      0.200 write(4, ..., 1460) = 1460
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 13140) = 13140
      +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
      
      0.200 > P. 1:1461(1460) ack 1
      0.200 > . 1461:8761(7300) ack 1
      0.200 > P. 8761:14601(5840) ack 1
      
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:14601,nop,nop>
      0.300 > P. 1:1461(1460) ack 1
      0.400 < . 1:1(0) ack 14601 win 257
      
      BPF Output Before:
      ~~~~~
      <No output due to missing SCM_TSTAMP_ACK timestamp>
      
      BPF Output After:
      ~~~~~
      <...>-2049  [007] d.s.    89.185538: : ee_data:14599
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Tested-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2de8023e
    • M
      tcp: Carry txstamp_ack in tcp_fragment_tstamp · b51e13fa
      Martin KaFai Lau 提交于
      When a tcp skb is sliced into two smaller skbs (e.g. in
      tcp_fragment() and tso_fragment()),  it does not carry
      the txstamp_ack bit to the newly created skb if it is needed.
      The end result is a timestamping event (SCM_TSTAMP_ACK) will
      be missing from the sk->sk_error_queue.
      
      This patch carries this bit to the new skb2
      in tcp_fragment_tstamp().
      
      BPF Output Before:
      ~~~~~~
      <No output due to missing SCM_TSTAMP_ACK timestamp>
      
      BPF Output After:
      ~~~~~~
      <...>-2050  [000] d.s.   100.928763: : ee_data:14599
      
      Packetdrill Script:
      ~~~~~~
      +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
      +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
      +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 14600) = 14600
      +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
      
      0.200 > . 1:7301(7300) ack 1
      0.200 > P. 7301:14601(7300) ack 1
      
      0.300 < . 1:1(0) ack 14601 win 257
      
      0.300 close(4) = 0
      0.300 > F. 14601:14601(0) ack 1
      0.400 < F. 1:1(0) ack 16062 win 257
      0.400 > . 14602:14602(0) ack 2
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Tested-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b51e13fa
  2. 24 4月, 2016 12 次提交
  3. 22 4月, 2016 17 次提交