1. 24 4月, 2016 2 次提交
  2. 22 4月, 2016 12 次提交
    • S
      openvswitch: use flow protocol when recalculating ipv6 checksums · b4f70527
      Simon Horman 提交于
      When using masked actions the ipv6_proto field of an action
      to set IPv6 fields may be zero rather than the prevailing protocol
      which will result in skipping checksum recalculation.
      
      This patch resolves the problem by relying on the protocol
      in the flow key rather than that in the set field action.
      
      Fixes: 83d2b9ba ("net: openvswitch: Support masked set actions.")
      Cc: Jarno Rajahalme <jrajahalme@nicira.com>
      Signed-off-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4f70527
    • A
      net: Add support for IP ID mangling TSO in cases that require encapsulation · 7f348a60
      Alexander Duyck 提交于
      This patch adds support for NETIF_F_TSO_MANGLEID if a given tunnel supports
      NETIF_F_TSO.  This way if needed a device can then later enable the TSO
      with IP ID mangling and the tunnels on top of that device can then also
      make use of the IP ID mangling as well.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f348a60
    • M
      tcp: Merge tx_flags and tskey in tcp_shifted_skb · cfea5a68
      Martin KaFai Lau 提交于
      After receiving sacks, tcp_shifted_skb() will collapse
      skbs if possible.  tx_flags and tskey also have to be
      merged.
      
      This patch reuses the tcp_skb_collapse_tstamp() to handle
      them.
      
      BPF Output Before:
      ~~~~~
      <no-output-due-to-missing-tstamp-event>
      
      BPF Output After:
      ~~~~~
      <...>-2024  [007] d.s.    88.644374: : ee_data:14599
      
      Packetdrill Script:
      ~~~~~
      +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
      +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
      +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      0.200 write(4, ..., 1460) = 1460
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 13140) = 13140
      
      0.200 > P. 1:1461(1460) ack 1
      0.200 > . 1461:8761(7300) ack 1
      0.200 > P. 8761:14601(5840) ack 1
      
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:14601,nop,nop>
      0.300 > P. 1:1461(1460) ack 1
      0.400 < . 1:1(0) ack 14601 win 257
      
      0.400 close(4) = 0
      0.400 > F. 14601:14601(0) ack 1
      0.500 < F. 1:1(0) ack 14602 win 257
      0.500 > . 14602:14602(0) ack 2
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Tested-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfea5a68
    • M
      tcp: Merge tx_flags and tskey in tcp_collapse_retrans · 082ac2d5
      Martin KaFai Lau 提交于
      If two skbs are merged/collapsed during retransmission, the current
      logic does not merge the tx_flags and tskey.  The end result is
      the SCM_TSTAMP_ACK timestamp could be missing for a packet.
      
      The patch:
      1. Merge the tx_flags
      2. Overwrite the prev_skb's tskey with the next_skb's tskey
      
      BPF Output Before:
      ~~~~~~
      <no-output-due-to-missing-tstamp-event>
      
      BPF Output After:
      ~~~~~~
      packetdrill-2092  [001] d.s.   453.998486: : ee_data:1459
      
      Packetdrill Script:
      ~~~~~~
      +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
      +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
      +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      0.200 write(4, ..., 730) = 730
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 730) = 730
      +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
      0.200 write(4, ..., 11680) = 11680
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      
      0.200 > P. 1:731(730) ack 1
      0.200 > P. 731:1461(730) ack 1
      0.200 > . 1461:8761(7300) ack 1
      0.200 > P. 8761:13141(4380) ack 1
      
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop>
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop>
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop>
      0.300 > P. 1:1461(1460) ack 1
      0.400 < . 1:1(0) ack 13141 win 257
      
      0.400 close(4) = 0
      0.400 > F. 13141:13141(0) ack 1
      0.500 < F. 1:1(0) ack 13142 win 257
      0.500 > . 13142:13142(0) ack 2
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Tested-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      082ac2d5
    • N
      3d6b66c1
    • N
      a9a08042
    • N
      58414d32
    • P
      NLA_BINARY misuse bug in HSR · f9375729
      Peter Heise 提交于
      Removed .type field from NLA to do proper length checking.
      Reported by Daniel Borkmann and Julia Lawall.
      Signed-off-by: NPeter Heise <peter.heise@airbus.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9375729
    • X
      net: use jiffies_to_msecs to replace EXPIRES_IN_MS in inet/sctp_diag · b7de529c
      Xin Long 提交于
      EXPIRES_IN_MS macro comes from net/ipv4/inet_diag.c and dates
      back to before jiffies_to_msecs() has been introduced.
      
      Now we can remove it and use jiffies_to_msecs().
      Suggested-by: NJakub Sitnicki <jkbs@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NJakub Sitnicki <jkbs@redhat.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7de529c
    • M
      tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks · 479f85c3
      Martin KaFai Lau 提交于
      Assuming SOF_TIMESTAMPING_TX_ACK is on. When dup acks are received,
      it could incorrectly think that a skb has already
      been acked and queue a SCM_TSTAMP_ACK cmsg to the
      sk->sk_error_queue.
      
      In tcp_ack_tstamp(), it checks
      'between(shinfo->tskey, prior_snd_una, tcp_sk(sk)->snd_una - 1)'.
      If prior_snd_una == tcp_sk(sk)->snd_una like the following packetdrill
      script, between() returns true but the tskey is actually not acked.
      e.g. try between(3, 2, 1).
      
      The fix is to replace between() with one before() and one !before().
      By doing this, the -1 offset on the tcp_sk(sk)->snd_una can also be
      removed.
      
      A packetdrill script is used to reproduce the dup ack scenario.
      Due to the lacking cmsg support in packetdrill (may be I
      cannot find it),  a BPF prog is used to kprobe to
      sock_queue_err_skb() and print out the value of
      serr->ee.ee_data.
      
      Both the packetdrill and the bcc BPF script is attached at the end of
      this commit message.
      
      BPF Output Before Fix:
      ~~~~~~
            <...>-2056  [001] d.s.   433.927987: : ee_data:1459  #incorrect
      packetdrill-2056  [001] d.s.   433.929563: : ee_data:1459  #incorrect
      packetdrill-2056  [001] d.s.   433.930765: : ee_data:1459  #incorrect
      packetdrill-2056  [001] d.s.   434.028177: : ee_data:1459
      packetdrill-2056  [001] d.s.   434.029686: : ee_data:14599
      
      BPF Output After Fix:
      ~~~~~~
            <...>-2049  [000] d.s.   113.517039: : ee_data:1459
            <...>-2049  [000] d.s.   113.517253: : ee_data:14599
      
      BCC BPF Script:
      ~~~~~~
      #!/usr/bin/env python
      
      from __future__ import print_function
      from bcc import BPF
      
      bpf_text = """
      #include <uapi/linux/ptrace.h>
      #include <net/sock.h>
      #include <bcc/proto.h>
      #include <linux/errqueue.h>
      
      #ifdef memset
      #undef memset
      #endif
      
      int trace_err_skb(struct pt_regs *ctx)
      {
      	struct sk_buff *skb = (struct sk_buff *)ctx->si;
      	struct sock *sk = (struct sock *)ctx->di;
      	struct sock_exterr_skb *serr;
      	u32 ee_data = 0;
      
      	if (!sk || !skb)
      		return 0;
      
      	serr = SKB_EXT_ERR(skb);
      	bpf_probe_read(&ee_data, sizeof(ee_data), &serr->ee.ee_data);
      	bpf_trace_printk("ee_data:%u\\n", ee_data);
      
      	return 0;
      };
      """
      
      b = BPF(text=bpf_text)
      b.attach_kprobe(event="sock_queue_err_skb", fn_name="trace_err_skb")
      print("Attached to kprobe")
      b.trace_print()
      
      Packetdrill Script:
      ~~~~~~
      +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
      +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
      +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 1460) = 1460
      0.200 write(4, ..., 13140) = 13140
      
      0.200 > P. 1:1461(1460) ack 1
      0.200 > . 1461:8761(7300) ack 1
      0.200 > P. 8761:14601(5840) ack 1
      
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop>
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop>
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop>
      0.300 > P. 1:1461(1460) ack 1
      0.400 < . 1:1(0) ack 14601 win 257
      
      0.400 close(4) = 0
      0.400 > F. 14601:14601(0) ack 1
      0.500 < F. 1:1(0) ack 14602 win 257
      0.500 > . 14602:14602(0) ack 2
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil.kdev@gmail.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Tested-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      479f85c3
    • V
      net: dsa: remove tag_protocol from dsa_switch · c60c9840
      Vivien Didelot 提交于
      Having the tag protocol in dsa_switch_driver for setup time and in
      dsa_switch_tree for runtime is enough. Remove dsa_switch's one.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c60c9840
    • J
      openvswitch: Orphan skbs before IPv6 defrag · 49e261a8
      Joe Stringer 提交于
      This is the IPv6 counterpart to commit 8282f274 ("inet: frag: Always
      orphan skbs inside ip_defrag()").
      
      Prior to commit 029f7f3b ("netfilter: ipv6: nf_defrag: avoid/free
      clone operations"), ipv6 fragments sent to nf_ct_frag6_gather() would be
      cloned (implicitly orphaning) prior to queueing for reassembly. As such,
      when the IPv6 message is eventually reassembled, the skb->sk for all
      fragments would be NULL. After that commit was introduced, rather than
      cloning, the original skbs were queued directly without orphaning. The
      end result is that all frags except for the first and last may have a
      socket attached.
      
      This commit explicitly orphans such skbs during nf_ct_frag6_gather() to
      prevent BUG_ON(skb->sk) during a later call to ip6_fragment().
      
      kernel BUG at net/ipv6/ip6_output.c:631!
      [...]
      Call Trace:
       <IRQ>
       [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
       [<ffffffffa042c7c0>] ? do_output.isra.28+0x1b0/0x1b0 [openvswitch]
       [<ffffffff810bb8a2>] ? __lock_is_held+0x52/0x70
       [<ffffffffa042c587>] ovs_fragment+0x1f7/0x280 [openvswitch]
       [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
       [<ffffffff817be416>] ? _raw_spin_unlock_irqrestore+0x36/0x50
       [<ffffffff81697ea0>] ? dst_discard_out+0x20/0x20
       [<ffffffff81697e80>] ? dst_ifdown+0x80/0x80
       [<ffffffffa042c703>] do_output.isra.28+0xf3/0x1b0 [openvswitch]
       [<ffffffffa042d279>] do_execute_actions+0x709/0x12c0 [openvswitch]
       [<ffffffffa04340a4>] ? ovs_flow_stats_update+0x74/0x1e0 [openvswitch]
       [<ffffffffa04340d1>] ? ovs_flow_stats_update+0xa1/0x1e0 [openvswitch]
       [<ffffffff817be387>] ? _raw_spin_unlock+0x27/0x40
       [<ffffffffa042de75>] ovs_execute_actions+0x45/0x120 [openvswitch]
       [<ffffffffa0432d65>] ovs_dp_process_packet+0x85/0x150 [openvswitch]
       [<ffffffff817be387>] ? _raw_spin_unlock+0x27/0x40
       [<ffffffffa042def4>] ovs_execute_actions+0xc4/0x120 [openvswitch]
       [<ffffffffa0432d65>] ovs_dp_process_packet+0x85/0x150 [openvswitch]
       [<ffffffffa04337f2>] ? key_extract+0x442/0xc10 [openvswitch]
       [<ffffffffa043b26d>] ovs_vport_receive+0x5d/0xb0 [openvswitch]
       [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
       [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
       [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
       [<ffffffff817be416>] ? _raw_spin_unlock_irqrestore+0x36/0x50
       [<ffffffffa043c11d>] internal_dev_xmit+0x6d/0x150 [openvswitch]
       [<ffffffffa043c0b5>] ? internal_dev_xmit+0x5/0x150 [openvswitch]
       [<ffffffff8168fb5f>] dev_hard_start_xmit+0x2df/0x660
       [<ffffffff8168f5ea>] ? validate_xmit_skb.isra.105.part.106+0x1a/0x2b0
       [<ffffffff81690925>] __dev_queue_xmit+0x8f5/0x950
       [<ffffffff81690080>] ? __dev_queue_xmit+0x50/0x950
       [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
       [<ffffffff81690990>] dev_queue_xmit+0x10/0x20
       [<ffffffff8169a418>] neigh_resolve_output+0x178/0x220
       [<ffffffff81752759>] ? ip6_finish_output2+0x219/0x7b0
       [<ffffffff81752759>] ip6_finish_output2+0x219/0x7b0
       [<ffffffff817525a5>] ? ip6_finish_output2+0x65/0x7b0
       [<ffffffff816cde2b>] ? ip_idents_reserve+0x6b/0x80
       [<ffffffff8175488f>] ? ip6_fragment+0x93f/0xc50
       [<ffffffff81754af1>] ip6_fragment+0xba1/0xc50
       [<ffffffff81752540>] ? ip6_flush_pending_frames+0x40/0x40
       [<ffffffff81754c6b>] ip6_finish_output+0xcb/0x1d0
       [<ffffffff81754dcf>] ip6_output+0x5f/0x1a0
       [<ffffffff81754ba0>] ? ip6_fragment+0xc50/0xc50
       [<ffffffff81797fbd>] ip6_local_out+0x3d/0x80
       [<ffffffff817554df>] ip6_send_skb+0x2f/0xc0
       [<ffffffff817555bd>] ip6_push_pending_frames+0x4d/0x50
       [<ffffffff817796cc>] icmpv6_push_pending_frames+0xac/0xe0
       [<ffffffff8177a4be>] icmpv6_echo_reply+0x42e/0x500
       [<ffffffff8177acbf>] icmpv6_rcv+0x4cf/0x580
       [<ffffffff81755ac7>] ip6_input_finish+0x1a7/0x690
       [<ffffffff81755925>] ? ip6_input_finish+0x5/0x690
       [<ffffffff817567a0>] ip6_input+0x30/0xa0
       [<ffffffff81755920>] ? ip6_rcv_finish+0x1a0/0x1a0
       [<ffffffff817557ce>] ip6_rcv_finish+0x4e/0x1a0
       [<ffffffff8175640f>] ipv6_rcv+0x45f/0x7c0
       [<ffffffff81755fe6>] ? ipv6_rcv+0x36/0x7c0
       [<ffffffff81755780>] ? ip6_make_skb+0x1c0/0x1c0
       [<ffffffff8168b649>] __netif_receive_skb_core+0x229/0xb80
       [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
       [<ffffffff8168c07f>] ? process_backlog+0x6f/0x230
       [<ffffffff8168bfb6>] __netif_receive_skb+0x16/0x70
       [<ffffffff8168c088>] process_backlog+0x78/0x230
       [<ffffffff8168c0ed>] ? process_backlog+0xdd/0x230
       [<ffffffff8168db43>] net_rx_action+0x203/0x480
       [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
       [<ffffffff817c156e>] __do_softirq+0xde/0x49f
       [<ffffffff81752768>] ? ip6_finish_output2+0x228/0x7b0
       [<ffffffff817c070c>] do_softirq_own_stack+0x1c/0x30
       <EOI>
       [<ffffffff8106f88b>] do_softirq.part.18+0x3b/0x40
       [<ffffffff8106f946>] __local_bh_enable_ip+0xb6/0xc0
       [<ffffffff81752791>] ip6_finish_output2+0x251/0x7b0
       [<ffffffff81754af1>] ? ip6_fragment+0xba1/0xc50
       [<ffffffff816cde2b>] ? ip_idents_reserve+0x6b/0x80
       [<ffffffff8175488f>] ? ip6_fragment+0x93f/0xc50
       [<ffffffff81754af1>] ip6_fragment+0xba1/0xc50
       [<ffffffff81752540>] ? ip6_flush_pending_frames+0x40/0x40
       [<ffffffff81754c6b>] ip6_finish_output+0xcb/0x1d0
       [<ffffffff81754dcf>] ip6_output+0x5f/0x1a0
       [<ffffffff81754ba0>] ? ip6_fragment+0xc50/0xc50
       [<ffffffff81797fbd>] ip6_local_out+0x3d/0x80
       [<ffffffff817554df>] ip6_send_skb+0x2f/0xc0
       [<ffffffff817555bd>] ip6_push_pending_frames+0x4d/0x50
       [<ffffffff81778558>] rawv6_sendmsg+0xa28/0xe30
       [<ffffffff81719097>] ? inet_sendmsg+0xc7/0x1d0
       [<ffffffff817190d6>] inet_sendmsg+0x106/0x1d0
       [<ffffffff81718fd5>] ? inet_sendmsg+0x5/0x1d0
       [<ffffffff8166d078>] sock_sendmsg+0x38/0x50
       [<ffffffff8166d4d6>] SYSC_sendto+0xf6/0x170
       [<ffffffff8100201b>] ? trace_hardirqs_on_thunk+0x1b/0x1d
       [<ffffffff8166e38e>] SyS_sendto+0xe/0x10
       [<ffffffff817bebe5>] entry_SYSCALL_64_fastpath+0x18/0xa8
      Code: 06 48 83 3f 00 75 26 48 8b 87 d8 00 00 00 2b 87 d0 00 00 00 48 39 d0 72 14 8b 87 e4 00 00 00 83 f8 01 75 09 48 83 7f 18 00 74 9a <0f> 0b 41 8b 86 cc 00 00 00 49 8#
      RIP  [<ffffffff8175468a>] ip6_fragment+0x73a/0xc50
       RSP <ffff880072803120>
      
      Fixes: 029f7f3b ("netfilter: ipv6: nf_defrag: avoid/free clone
      operations")
      Reported-by: NDaniele Di Proietto <diproiettod@vmware.com>
      Signed-off-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49e261a8
  3. 21 4月, 2016 1 次提交
    • R
      rtnetlink: add new RTM_GETSTATS message to dump link stats · 10c9ead9
      Roopa Prabhu 提交于
      This patch adds a new RTM_GETSTATS message to query link stats via netlink
      from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK
      returns a lot more than just stats and is expensive in some cases when
      frequent polling for stats from userspace is a common operation.
      
      RTM_GETSTATS is an attempt to provide a light weight netlink message
      to explicity query only link stats from the kernel on an interface.
      The idea is to also keep it extensible so that new kinds of stats can be
      added to it in the future.
      
      This patch adds the following attribute for NETDEV stats:
      struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = {
              [IFLA_STATS_LINK_64]  = { .len = sizeof(struct rtnl_link_stats64) },
      };
      
      Like any other rtnetlink message, RTM_GETSTATS can be used to get stats of
      a single interface or all interfaces with NLM_F_DUMP.
      
      Future possible new types of stat attributes:
      link af stats:
          - IFLA_STATS_LINK_IPV6  (nested. for ipv6 stats)
          - IFLA_STATS_LINK_MPLS  (nested. for mpls/mdev stats)
      extended stats:
          - IFLA_STATS_LINK_EXTENDED (nested. extended software netdev stats like bridge,
            vlan, vxlan etc)
          - IFLA_STATS_LINK_HW_EXTENDED (nested. extended hardware stats which are
            available via ethtool today)
      
      This patch also declares a filter mask for all stat attributes.
      User has to provide a mask of stats attributes to query. filter mask
      can be specified in the new hdr 'struct if_stats_msg' for stats messages.
      Other important field in the header is the ifindex.
      
      This api can also include attributes for global stats (eg tcp) in the future.
      When global stats are included in a stats msg, the ifindex in the header
      must be zero. A single stats message cannot contain both global and
      netdev specific stats. To easily distinguish them, netdev specific stat
      attributes name are prefixed with IFLA_STATS_LINK_
      
      Without any attributes in the filter_mask, no stats will be returned.
      
      This patch has been tested with mofified iproute2 ifstat.
      Suggested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10c9ead9
  4. 20 4月, 2016 7 次提交
  5. 19 4月, 2016 3 次提交
  6. 18 4月, 2016 1 次提交
  7. 17 4月, 2016 8 次提交
  8. 16 4月, 2016 6 次提交
    • D
      vlan: pull on __vlan_insert_tag error path and fix csum correction · 9241e2df
      Daniel Borkmann 提交于
      When __vlan_insert_tag() fails from skb_vlan_push() path due to the
      skb_cow_head(), we need to undo the __skb_push() in the error path
      as well that was done earlier to move skb->data pointer to mac header.
      
      Moreover, I noticed that when in the non-error path the __skb_pull()
      is done and the original offset to mac header was non-zero, we fixup
      from a wrong skb->data offset in the checksum complete processing.
      
      So the skb_postpush_rcsum() really needs to be done before __skb_pull()
      where skb->data still points to the mac header start and thus operates
      under the same conditions as in __vlan_insert_tag().
      
      Fixes: 93515d53 ("net: move vlan pop/push functions into common code")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9241e2df
    • X
      sctp: fix some rhashtable functions using in sctp proc/diag · 53fa1036
      Xin Long 提交于
      When rhashtable_walk_init return err, no release function should be
      called, and when rhashtable_walk_start return err, we should only invoke
      rhashtable_walk_exit to release the source.
      
      But now when sctp_transport_walk_start return err, we just call
      rhashtable_walk_stop/exit, and never care about if rhashtable_walk_init
      or start return err, which is so bad.
      
      We will fix it by calling rhashtable_walk_exit if rhashtable_walk_start
      return err in sctp_transport_walk_start, and if sctp_transport_walk_start
      return err, we do not need to call sctp_transport_walk_stop any more.
      
      For sctp proc, we will use 'iter->start_fail' to decide if we will call
      rhashtable_walk_stop/exit.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53fa1036
    • X
      sctp: merge the seq_start/next/exits in remaddrs and assocs · b5e2f4e6
      Xin Long 提交于
      In sctp proc, these three functions in remaddrs and assocs are the
      same. we should merge them into one.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b5e2f4e6
    • X
      sctp: add the sctp_diag.c file · 8f840e47
      Xin Long 提交于
      This one will implement all the interface of inet_diag, inet_diag_handler.
      which includes sctp_diag_dump, sctp_diag_dump_one and sctp_diag_get_info.
      
      It will work as a module, and register inet_diag_handler when loading.
      
      v2->v3:
      - fix the mistake in inet_assoc_attr_size().
      
      - change inet_diag_msg_laddrs_fill() name to inet_diag_msg_sctpladdrs_fill.
      
      - change inet_diag_msg_paddrs_fill() name to inet_diag_msg_sctpaddrs_fill.
      
      - add inet_diag_msg_sctpinfo_fill() to make asoc/ep fill code clearer.
      
      - add inet_diag_msg_sctpasoc_fill() to make asoc fill code clearer.
      
      - merge inet_asoc_diag_fill() and inet_ep_diag_fill() to
        inet_sctp_diag_fill().
      
      - call sctp_diag_get_info() directly, instead by handler, cause the caller
        is in the same file with it.
      
      - call lock_sock in sctp_tsp_dump_one() to make sure we call get sctp info
        safely.
      
      - after lock_sock(sk), we should check sk != assoc->base.sk.
      
      - change mem[SK_MEMINFO_WMEM_ALLOC] to asoc->sndbuf_used for asoc dump when
        asoc->ep->sndbuf_policy is set. don't use INET_DIAG_MEMINFO attr any more.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f840e47
    • X
      sctp: export some functions for sctp_diag in inet_diag · cb2050a7
      Xin Long 提交于
      inet_diag_msg_common_fill is used to fill the diag msg common info,
      we need to use it in sctp_diag as well, so export it.
      
      inet_diag_msg_attrs_fill is used to fill some common attrs info between
      sctp diag and tcp diag.
      
      v2->v3:
      - do not need to define and export inet_diag_get_handler any more.
        cause all the functions in it are in sctp_diag.ko, we just call
        them in sctp_diag.ko.
      
      - add inet_diag_msg_attrs_fill to make codes clear.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb2050a7
    • X
      sctp: export some apis or variables for sctp_diag and reuse some for proc · 626d16f5
      Xin Long 提交于
      For some main variables in sctp.ko, we couldn't export it to other modules,
      so we have to define some api to access them.
      
      It will include sctp transport and endpoint's traversal.
      
      There are some transport traversal functions for sctp_diag, we can also
      use it for sctp_proc. cause they have the similar situation to traversal
      transport.
      
      v2->v3:
      - rhashtable_walk_init need the parameter gfp, because of recent upstrem
        update
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      626d16f5