1. 26 4月, 2016 11 次提交
  2. 25 4月, 2016 4 次提交
    • E
      tcp-tso: do not split TSO packets at retransmit time · 10d3be56
      Eric Dumazet 提交于
      Linux TCP stack painfully segments all TSO/GSO packets before retransmits.
      
      This was fine back in the days when TSO/GSO were emerging, with their
      bugs, but we believe the dark age is over.
      
      Keeping big packets in write queues, but also in stack traversal
      has a lot of benefits.
       - Less memory overhead, because write queues have less skbs
       - Less cpu overhead at ACK processing.
       - Better SACK processing, as lot of studies mentioned how
         awful linux was at this ;)
       - Less cpu overhead to send the rtx packets
         (IP stack traversal, netfilter traversal, drivers...)
       - Better latencies in presence of losses.
       - Smaller spikes in fq like packet schedulers, as retransmits
         are not constrained by TCP Small Queues.
      
      1 % packet losses are common today, and at 100Gbit speeds, this
      translates to ~80,000 losses per second.
      Losses are often correlated, and we see many retransmit events
      leading to 1-MSS train of packets, at the time hosts are already
      under stress.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10d3be56
    • P
      tipc: fix stale links after re-enabling bearer · 8cee83dd
      Parthasarathy Bhuvaragan 提交于
      Commit 42b18f60 ("tipc: refactor function tipc_link_timeout()"),
      introduced a bug which prevents sending of probe messages during
      link synchronization phase. This leads to hanging links, if the
      bearer is disabled/enabled after links are up.
      
      In this commit, we send the probe messages correctly.
      
      Fixes: 42b18f60 ("tipc: refactor function tipc_link_timeout()")
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cee83dd
    • M
      tcp: Merge txstamp_ack in tcp_skb_collapse_tstamp · 2de8023e
      Martin KaFai Lau 提交于
      When collapsing skbs, txstamp_ack also needs to be merged.
      
      Retrans Collapse Test:
      ~~~~~~
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      0.200 write(4, ..., 730) = 730
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 730) = 730
      +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
      0.200 write(4, ..., 11680) = 11680
      
      0.200 > P. 1:731(730) ack 1
      0.200 > P. 731:1461(730) ack 1
      0.200 > . 1461:8761(7300) ack 1
      0.200 > P. 8761:13141(4380) ack 1
      
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop>
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop>
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop>
      0.300 > P. 1:1461(1460) ack 1
      0.400 < . 1:1(0) ack 13141 win 257
      
      BPF Output Before:
      ~~~~~
      <No output due to missing SCM_TSTAMP_ACK timestamp>
      
      BPF Output After:
      ~~~~~
      <...>-2027  [007] d.s.    79.765921: : ee_data:1459
      
      Sacks Collapse Test:
      ~~~~~
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      0.200 write(4, ..., 1460) = 1460
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 13140) = 13140
      +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
      
      0.200 > P. 1:1461(1460) ack 1
      0.200 > . 1461:8761(7300) ack 1
      0.200 > P. 8761:14601(5840) ack 1
      
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:14601,nop,nop>
      0.300 > P. 1:1461(1460) ack 1
      0.400 < . 1:1(0) ack 14601 win 257
      
      BPF Output Before:
      ~~~~~
      <No output due to missing SCM_TSTAMP_ACK timestamp>
      
      BPF Output After:
      ~~~~~
      <...>-2049  [007] d.s.    89.185538: : ee_data:14599
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Tested-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2de8023e
    • M
      tcp: Carry txstamp_ack in tcp_fragment_tstamp · b51e13fa
      Martin KaFai Lau 提交于
      When a tcp skb is sliced into two smaller skbs (e.g. in
      tcp_fragment() and tso_fragment()),  it does not carry
      the txstamp_ack bit to the newly created skb if it is needed.
      The end result is a timestamping event (SCM_TSTAMP_ACK) will
      be missing from the sk->sk_error_queue.
      
      This patch carries this bit to the new skb2
      in tcp_fragment_tstamp().
      
      BPF Output Before:
      ~~~~~~
      <No output due to missing SCM_TSTAMP_ACK timestamp>
      
      BPF Output After:
      ~~~~~~
      <...>-2050  [000] d.s.   100.928763: : ee_data:14599
      
      Packetdrill Script:
      ~~~~~~
      +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
      +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
      +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 14600) = 14600
      +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
      
      0.200 > . 1:7301(7300) ack 1
      0.200 > P. 7301:14601(7300) ack 1
      
      0.300 < . 1:1(0) ack 14601 win 257
      
      0.300 close(4) = 0
      0.300 > F. 14601:14601(0) ack 1
      0.400 < F. 1:1(0) ack 16062 win 257
      0.400 > . 14602:14602(0) ack 2
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Tested-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b51e13fa
  3. 24 4月, 2016 4 次提交
  4. 22 4月, 2016 12 次提交
    • S
      openvswitch: use flow protocol when recalculating ipv6 checksums · b4f70527
      Simon Horman 提交于
      When using masked actions the ipv6_proto field of an action
      to set IPv6 fields may be zero rather than the prevailing protocol
      which will result in skipping checksum recalculation.
      
      This patch resolves the problem by relying on the protocol
      in the flow key rather than that in the set field action.
      
      Fixes: 83d2b9ba ("net: openvswitch: Support masked set actions.")
      Cc: Jarno Rajahalme <jrajahalme@nicira.com>
      Signed-off-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4f70527
    • A
      net: Add support for IP ID mangling TSO in cases that require encapsulation · 7f348a60
      Alexander Duyck 提交于
      This patch adds support for NETIF_F_TSO_MANGLEID if a given tunnel supports
      NETIF_F_TSO.  This way if needed a device can then later enable the TSO
      with IP ID mangling and the tunnels on top of that device can then also
      make use of the IP ID mangling as well.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f348a60
    • M
      tcp: Merge tx_flags and tskey in tcp_shifted_skb · cfea5a68
      Martin KaFai Lau 提交于
      After receiving sacks, tcp_shifted_skb() will collapse
      skbs if possible.  tx_flags and tskey also have to be
      merged.
      
      This patch reuses the tcp_skb_collapse_tstamp() to handle
      them.
      
      BPF Output Before:
      ~~~~~
      <no-output-due-to-missing-tstamp-event>
      
      BPF Output After:
      ~~~~~
      <...>-2024  [007] d.s.    88.644374: : ee_data:14599
      
      Packetdrill Script:
      ~~~~~
      +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
      +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
      +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      0.200 write(4, ..., 1460) = 1460
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 13140) = 13140
      
      0.200 > P. 1:1461(1460) ack 1
      0.200 > . 1461:8761(7300) ack 1
      0.200 > P. 8761:14601(5840) ack 1
      
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:14601,nop,nop>
      0.300 > P. 1:1461(1460) ack 1
      0.400 < . 1:1(0) ack 14601 win 257
      
      0.400 close(4) = 0
      0.400 > F. 14601:14601(0) ack 1
      0.500 < F. 1:1(0) ack 14602 win 257
      0.500 > . 14602:14602(0) ack 2
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Tested-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfea5a68
    • M
      tcp: Merge tx_flags and tskey in tcp_collapse_retrans · 082ac2d5
      Martin KaFai Lau 提交于
      If two skbs are merged/collapsed during retransmission, the current
      logic does not merge the tx_flags and tskey.  The end result is
      the SCM_TSTAMP_ACK timestamp could be missing for a packet.
      
      The patch:
      1. Merge the tx_flags
      2. Overwrite the prev_skb's tskey with the next_skb's tskey
      
      BPF Output Before:
      ~~~~~~
      <no-output-due-to-missing-tstamp-event>
      
      BPF Output After:
      ~~~~~~
      packetdrill-2092  [001] d.s.   453.998486: : ee_data:1459
      
      Packetdrill Script:
      ~~~~~~
      +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
      +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
      +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      0.200 write(4, ..., 730) = 730
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 730) = 730
      +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
      0.200 write(4, ..., 11680) = 11680
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      
      0.200 > P. 1:731(730) ack 1
      0.200 > P. 731:1461(730) ack 1
      0.200 > . 1461:8761(7300) ack 1
      0.200 > P. 8761:13141(4380) ack 1
      
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop>
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop>
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop>
      0.300 > P. 1:1461(1460) ack 1
      0.400 < . 1:1(0) ack 13141 win 257
      
      0.400 close(4) = 0
      0.400 > F. 13141:13141(0) ack 1
      0.500 < F. 1:1(0) ack 13142 win 257
      0.500 > . 13142:13142(0) ack 2
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Tested-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      082ac2d5
    • N
      3d6b66c1
    • N
      a9a08042
    • N
      58414d32
    • P
      NLA_BINARY misuse bug in HSR · f9375729
      Peter Heise 提交于
      Removed .type field from NLA to do proper length checking.
      Reported by Daniel Borkmann and Julia Lawall.
      Signed-off-by: NPeter Heise <peter.heise@airbus.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9375729
    • X
      net: use jiffies_to_msecs to replace EXPIRES_IN_MS in inet/sctp_diag · b7de529c
      Xin Long 提交于
      EXPIRES_IN_MS macro comes from net/ipv4/inet_diag.c and dates
      back to before jiffies_to_msecs() has been introduced.
      
      Now we can remove it and use jiffies_to_msecs().
      Suggested-by: NJakub Sitnicki <jkbs@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NJakub Sitnicki <jkbs@redhat.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7de529c
    • M
      tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks · 479f85c3
      Martin KaFai Lau 提交于
      Assuming SOF_TIMESTAMPING_TX_ACK is on. When dup acks are received,
      it could incorrectly think that a skb has already
      been acked and queue a SCM_TSTAMP_ACK cmsg to the
      sk->sk_error_queue.
      
      In tcp_ack_tstamp(), it checks
      'between(shinfo->tskey, prior_snd_una, tcp_sk(sk)->snd_una - 1)'.
      If prior_snd_una == tcp_sk(sk)->snd_una like the following packetdrill
      script, between() returns true but the tskey is actually not acked.
      e.g. try between(3, 2, 1).
      
      The fix is to replace between() with one before() and one !before().
      By doing this, the -1 offset on the tcp_sk(sk)->snd_una can also be
      removed.
      
      A packetdrill script is used to reproduce the dup ack scenario.
      Due to the lacking cmsg support in packetdrill (may be I
      cannot find it),  a BPF prog is used to kprobe to
      sock_queue_err_skb() and print out the value of
      serr->ee.ee_data.
      
      Both the packetdrill and the bcc BPF script is attached at the end of
      this commit message.
      
      BPF Output Before Fix:
      ~~~~~~
            <...>-2056  [001] d.s.   433.927987: : ee_data:1459  #incorrect
      packetdrill-2056  [001] d.s.   433.929563: : ee_data:1459  #incorrect
      packetdrill-2056  [001] d.s.   433.930765: : ee_data:1459  #incorrect
      packetdrill-2056  [001] d.s.   434.028177: : ee_data:1459
      packetdrill-2056  [001] d.s.   434.029686: : ee_data:14599
      
      BPF Output After Fix:
      ~~~~~~
            <...>-2049  [000] d.s.   113.517039: : ee_data:1459
            <...>-2049  [000] d.s.   113.517253: : ee_data:14599
      
      BCC BPF Script:
      ~~~~~~
      #!/usr/bin/env python
      
      from __future__ import print_function
      from bcc import BPF
      
      bpf_text = """
      #include <uapi/linux/ptrace.h>
      #include <net/sock.h>
      #include <bcc/proto.h>
      #include <linux/errqueue.h>
      
      #ifdef memset
      #undef memset
      #endif
      
      int trace_err_skb(struct pt_regs *ctx)
      {
      	struct sk_buff *skb = (struct sk_buff *)ctx->si;
      	struct sock *sk = (struct sock *)ctx->di;
      	struct sock_exterr_skb *serr;
      	u32 ee_data = 0;
      
      	if (!sk || !skb)
      		return 0;
      
      	serr = SKB_EXT_ERR(skb);
      	bpf_probe_read(&ee_data, sizeof(ee_data), &serr->ee.ee_data);
      	bpf_trace_printk("ee_data:%u\\n", ee_data);
      
      	return 0;
      };
      """
      
      b = BPF(text=bpf_text)
      b.attach_kprobe(event="sock_queue_err_skb", fn_name="trace_err_skb")
      print("Attached to kprobe")
      b.trace_print()
      
      Packetdrill Script:
      ~~~~~~
      +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
      +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
      +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 1460) = 1460
      0.200 write(4, ..., 13140) = 13140
      
      0.200 > P. 1:1461(1460) ack 1
      0.200 > . 1461:8761(7300) ack 1
      0.200 > P. 8761:14601(5840) ack 1
      
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop>
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop>
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop>
      0.300 > P. 1:1461(1460) ack 1
      0.400 < . 1:1(0) ack 14601 win 257
      
      0.400 close(4) = 0
      0.400 > F. 14601:14601(0) ack 1
      0.500 < F. 1:1(0) ack 14602 win 257
      0.500 > . 14602:14602(0) ack 2
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil.kdev@gmail.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Tested-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      479f85c3
    • V
      net: dsa: remove tag_protocol from dsa_switch · c60c9840
      Vivien Didelot 提交于
      Having the tag protocol in dsa_switch_driver for setup time and in
      dsa_switch_tree for runtime is enough. Remove dsa_switch's one.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c60c9840
    • J
      openvswitch: Orphan skbs before IPv6 defrag · 49e261a8
      Joe Stringer 提交于
      This is the IPv6 counterpart to commit 8282f274 ("inet: frag: Always
      orphan skbs inside ip_defrag()").
      
      Prior to commit 029f7f3b ("netfilter: ipv6: nf_defrag: avoid/free
      clone operations"), ipv6 fragments sent to nf_ct_frag6_gather() would be
      cloned (implicitly orphaning) prior to queueing for reassembly. As such,
      when the IPv6 message is eventually reassembled, the skb->sk for all
      fragments would be NULL. After that commit was introduced, rather than
      cloning, the original skbs were queued directly without orphaning. The
      end result is that all frags except for the first and last may have a
      socket attached.
      
      This commit explicitly orphans such skbs during nf_ct_frag6_gather() to
      prevent BUG_ON(skb->sk) during a later call to ip6_fragment().
      
      kernel BUG at net/ipv6/ip6_output.c:631!
      [...]
      Call Trace:
       <IRQ>
       [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
       [<ffffffffa042c7c0>] ? do_output.isra.28+0x1b0/0x1b0 [openvswitch]
       [<ffffffff810bb8a2>] ? __lock_is_held+0x52/0x70
       [<ffffffffa042c587>] ovs_fragment+0x1f7/0x280 [openvswitch]
       [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
       [<ffffffff817be416>] ? _raw_spin_unlock_irqrestore+0x36/0x50
       [<ffffffff81697ea0>] ? dst_discard_out+0x20/0x20
       [<ffffffff81697e80>] ? dst_ifdown+0x80/0x80
       [<ffffffffa042c703>] do_output.isra.28+0xf3/0x1b0 [openvswitch]
       [<ffffffffa042d279>] do_execute_actions+0x709/0x12c0 [openvswitch]
       [<ffffffffa04340a4>] ? ovs_flow_stats_update+0x74/0x1e0 [openvswitch]
       [<ffffffffa04340d1>] ? ovs_flow_stats_update+0xa1/0x1e0 [openvswitch]
       [<ffffffff817be387>] ? _raw_spin_unlock+0x27/0x40
       [<ffffffffa042de75>] ovs_execute_actions+0x45/0x120 [openvswitch]
       [<ffffffffa0432d65>] ovs_dp_process_packet+0x85/0x150 [openvswitch]
       [<ffffffff817be387>] ? _raw_spin_unlock+0x27/0x40
       [<ffffffffa042def4>] ovs_execute_actions+0xc4/0x120 [openvswitch]
       [<ffffffffa0432d65>] ovs_dp_process_packet+0x85/0x150 [openvswitch]
       [<ffffffffa04337f2>] ? key_extract+0x442/0xc10 [openvswitch]
       [<ffffffffa043b26d>] ovs_vport_receive+0x5d/0xb0 [openvswitch]
       [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
       [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
       [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
       [<ffffffff817be416>] ? _raw_spin_unlock_irqrestore+0x36/0x50
       [<ffffffffa043c11d>] internal_dev_xmit+0x6d/0x150 [openvswitch]
       [<ffffffffa043c0b5>] ? internal_dev_xmit+0x5/0x150 [openvswitch]
       [<ffffffff8168fb5f>] dev_hard_start_xmit+0x2df/0x660
       [<ffffffff8168f5ea>] ? validate_xmit_skb.isra.105.part.106+0x1a/0x2b0
       [<ffffffff81690925>] __dev_queue_xmit+0x8f5/0x950
       [<ffffffff81690080>] ? __dev_queue_xmit+0x50/0x950
       [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
       [<ffffffff81690990>] dev_queue_xmit+0x10/0x20
       [<ffffffff8169a418>] neigh_resolve_output+0x178/0x220
       [<ffffffff81752759>] ? ip6_finish_output2+0x219/0x7b0
       [<ffffffff81752759>] ip6_finish_output2+0x219/0x7b0
       [<ffffffff817525a5>] ? ip6_finish_output2+0x65/0x7b0
       [<ffffffff816cde2b>] ? ip_idents_reserve+0x6b/0x80
       [<ffffffff8175488f>] ? ip6_fragment+0x93f/0xc50
       [<ffffffff81754af1>] ip6_fragment+0xba1/0xc50
       [<ffffffff81752540>] ? ip6_flush_pending_frames+0x40/0x40
       [<ffffffff81754c6b>] ip6_finish_output+0xcb/0x1d0
       [<ffffffff81754dcf>] ip6_output+0x5f/0x1a0
       [<ffffffff81754ba0>] ? ip6_fragment+0xc50/0xc50
       [<ffffffff81797fbd>] ip6_local_out+0x3d/0x80
       [<ffffffff817554df>] ip6_send_skb+0x2f/0xc0
       [<ffffffff817555bd>] ip6_push_pending_frames+0x4d/0x50
       [<ffffffff817796cc>] icmpv6_push_pending_frames+0xac/0xe0
       [<ffffffff8177a4be>] icmpv6_echo_reply+0x42e/0x500
       [<ffffffff8177acbf>] icmpv6_rcv+0x4cf/0x580
       [<ffffffff81755ac7>] ip6_input_finish+0x1a7/0x690
       [<ffffffff81755925>] ? ip6_input_finish+0x5/0x690
       [<ffffffff817567a0>] ip6_input+0x30/0xa0
       [<ffffffff81755920>] ? ip6_rcv_finish+0x1a0/0x1a0
       [<ffffffff817557ce>] ip6_rcv_finish+0x4e/0x1a0
       [<ffffffff8175640f>] ipv6_rcv+0x45f/0x7c0
       [<ffffffff81755fe6>] ? ipv6_rcv+0x36/0x7c0
       [<ffffffff81755780>] ? ip6_make_skb+0x1c0/0x1c0
       [<ffffffff8168b649>] __netif_receive_skb_core+0x229/0xb80
       [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
       [<ffffffff8168c07f>] ? process_backlog+0x6f/0x230
       [<ffffffff8168bfb6>] __netif_receive_skb+0x16/0x70
       [<ffffffff8168c088>] process_backlog+0x78/0x230
       [<ffffffff8168c0ed>] ? process_backlog+0xdd/0x230
       [<ffffffff8168db43>] net_rx_action+0x203/0x480
       [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
       [<ffffffff817c156e>] __do_softirq+0xde/0x49f
       [<ffffffff81752768>] ? ip6_finish_output2+0x228/0x7b0
       [<ffffffff817c070c>] do_softirq_own_stack+0x1c/0x30
       <EOI>
       [<ffffffff8106f88b>] do_softirq.part.18+0x3b/0x40
       [<ffffffff8106f946>] __local_bh_enable_ip+0xb6/0xc0
       [<ffffffff81752791>] ip6_finish_output2+0x251/0x7b0
       [<ffffffff81754af1>] ? ip6_fragment+0xba1/0xc50
       [<ffffffff816cde2b>] ? ip_idents_reserve+0x6b/0x80
       [<ffffffff8175488f>] ? ip6_fragment+0x93f/0xc50
       [<ffffffff81754af1>] ip6_fragment+0xba1/0xc50
       [<ffffffff81752540>] ? ip6_flush_pending_frames+0x40/0x40
       [<ffffffff81754c6b>] ip6_finish_output+0xcb/0x1d0
       [<ffffffff81754dcf>] ip6_output+0x5f/0x1a0
       [<ffffffff81754ba0>] ? ip6_fragment+0xc50/0xc50
       [<ffffffff81797fbd>] ip6_local_out+0x3d/0x80
       [<ffffffff817554df>] ip6_send_skb+0x2f/0xc0
       [<ffffffff817555bd>] ip6_push_pending_frames+0x4d/0x50
       [<ffffffff81778558>] rawv6_sendmsg+0xa28/0xe30
       [<ffffffff81719097>] ? inet_sendmsg+0xc7/0x1d0
       [<ffffffff817190d6>] inet_sendmsg+0x106/0x1d0
       [<ffffffff81718fd5>] ? inet_sendmsg+0x5/0x1d0
       [<ffffffff8166d078>] sock_sendmsg+0x38/0x50
       [<ffffffff8166d4d6>] SYSC_sendto+0xf6/0x170
       [<ffffffff8100201b>] ? trace_hardirqs_on_thunk+0x1b/0x1d
       [<ffffffff8166e38e>] SyS_sendto+0xe/0x10
       [<ffffffff817bebe5>] entry_SYSCALL_64_fastpath+0x18/0xa8
      Code: 06 48 83 3f 00 75 26 48 8b 87 d8 00 00 00 2b 87 d0 00 00 00 48 39 d0 72 14 8b 87 e4 00 00 00 83 f8 01 75 09 48 83 7f 18 00 74 9a <0f> 0b 41 8b 86 cc 00 00 00 49 8#
      RIP  [<ffffffff8175468a>] ip6_fragment+0x73a/0xc50
       RSP <ffff880072803120>
      
      Fixes: 029f7f3b ("netfilter: ipv6: nf_defrag: avoid/free clone
      operations")
      Reported-by: NDaniele Di Proietto <diproiettod@vmware.com>
      Signed-off-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49e261a8
  5. 21 4月, 2016 1 次提交
    • R
      rtnetlink: add new RTM_GETSTATS message to dump link stats · 10c9ead9
      Roopa Prabhu 提交于
      This patch adds a new RTM_GETSTATS message to query link stats via netlink
      from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK
      returns a lot more than just stats and is expensive in some cases when
      frequent polling for stats from userspace is a common operation.
      
      RTM_GETSTATS is an attempt to provide a light weight netlink message
      to explicity query only link stats from the kernel on an interface.
      The idea is to also keep it extensible so that new kinds of stats can be
      added to it in the future.
      
      This patch adds the following attribute for NETDEV stats:
      struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = {
              [IFLA_STATS_LINK_64]  = { .len = sizeof(struct rtnl_link_stats64) },
      };
      
      Like any other rtnetlink message, RTM_GETSTATS can be used to get stats of
      a single interface or all interfaces with NLM_F_DUMP.
      
      Future possible new types of stat attributes:
      link af stats:
          - IFLA_STATS_LINK_IPV6  (nested. for ipv6 stats)
          - IFLA_STATS_LINK_MPLS  (nested. for mpls/mdev stats)
      extended stats:
          - IFLA_STATS_LINK_EXTENDED (nested. extended software netdev stats like bridge,
            vlan, vxlan etc)
          - IFLA_STATS_LINK_HW_EXTENDED (nested. extended hardware stats which are
            available via ethtool today)
      
      This patch also declares a filter mask for all stat attributes.
      User has to provide a mask of stats attributes to query. filter mask
      can be specified in the new hdr 'struct if_stats_msg' for stats messages.
      Other important field in the header is the ifindex.
      
      This api can also include attributes for global stats (eg tcp) in the future.
      When global stats are included in a stats msg, the ifindex in the header
      must be zero. A single stats message cannot contain both global and
      netdev specific stats. To easily distinguish them, netdev specific stat
      attributes name are prefixed with IFLA_STATS_LINK_
      
      Without any attributes in the filter_mask, no stats will be returned.
      
      This patch has been tested with mofified iproute2 ifstat.
      Suggested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10c9ead9
  6. 20 4月, 2016 8 次提交