1. 08 2月, 2016 10 次提交
  2. 07 2月, 2016 1 次提交
    • E
      tcp: fastopen: call tcp_fin() if FIN present in SYNACK · e3e17b77
      Eric Dumazet 提交于
      When we acknowledge a FIN, it is not enough to ack the sequence number
      and queue the skb into receive queue. We also have to call tcp_fin()
      to properly update socket state and send proper poll() notifications.
      
      It seems we also had the problem if we received a SYN packet with the
      FIN flag set, but it does not seem an urgent issue, as no known
      implementation can do that.
      
      Fixes: 61d2bcae ("tcp: fastopen: accept data/FIN present in SYNACK message")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3e17b77
  3. 06 2月, 2016 2 次提交
  4. 30 1月, 2016 4 次提交
  5. 29 1月, 2016 3 次提交
    • E
      tcp: beware of alignments in tcp_get_info() · ff5d7497
      Eric Dumazet 提交于
      With some combinations of user provided flags in netlink command,
      it is possible to call tcp_get_info() with a buffer that is not 8-bytes
      aligned.
      
      It does matter on some arches, so we need to use put_unaligned() to
      store the u64 fields.
      
      Current iproute2 package does not trigger this particular issue.
      
      Fixes: 0df48c26 ("tcp: add tcpi_bytes_acked to tcp_info")
      Fixes: 977cb0ec ("tcp: add pacing_rate information into tcp_info")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff5d7497
    • N
      tcp: fix tcp_mark_head_lost to check skb len before fragmenting · d88270ee
      Neal Cardwell 提交于
      This commit fixes a corner case in tcp_mark_head_lost() which was
      causing the WARN_ON(len > skb->len) in tcp_fragment() to fire.
      
      tcp_mark_head_lost() was assuming that if a packet has
      tcp_skb_pcount(skb) of N, then it's safe to fragment off a prefix of
      M*mss bytes, for any M < N. But with the tricky way TCP pcounts are
      maintained, this is not always true.
      
      For example, suppose the sender sends 4 1-byte packets and have the
      last 3 packet sacked. It will merge the last 3 packets in the write
      queue into an skb with pcount = 3 and len = 3 bytes. If another
      recovery happens after a sack reneging event, tcp_mark_head_lost()
      may attempt to split the skb assuming it has more than 2*MSS bytes.
      
      This sounds very counterintuitive, but as the commit description for
      the related commit c0638c24 ("tcp: don't fragment SACKed skbs in
      tcp_mark_head_lost()") notes, this is because tcp_shifted_skb()
      coalesces adjacent regions of SACKed skbs, and when doing this it
      preserves the sum of their packet counts in order to reflect the
      real-world dynamics on the wire. The c0638c24 commit tried to
      avoid problems by not fragmenting SACKed skbs, since SACKed skbs are
      where the non-proportionality between pcount and skb->len/mss is known
      to be possible. However, that commit did not handle the case where
      during a reneging event one of these weird SACKed skbs becomes an
      un-SACKed skb, which tcp_mark_head_lost() can then try to fragment.
      
      The fix is to simply mark the entire skb lost when this happens.
      This makes the recovery slightly more aggressive in such corner
      cases before we detect reordering. But once we detect reordering
      this code path is by-passed because FACK is disabled.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d88270ee
    • J
      inet: frag: Always orphan skbs inside ip_defrag() · 8282f274
      Joe Stringer 提交于
      Later parts of the stack (including fragmentation) expect that there is
      never a socket attached to frag in a frag_list, however this invariant
      was not enforced on all defrag paths. This could lead to the
      BUG_ON(skb->sk) during ip_do_fragment(), as per the call stack at the
      end of this commit message.
      
      While the call could be added to openvswitch to fix this particular
      error, the head and tail of the frags list are already orphaned
      indirectly inside ip_defrag(), so it seems like the remaining fragments
      should all be orphaned in all circumstances.
      
      kernel BUG at net/ipv4/ip_output.c:586!
      [...]
      Call Trace:
       <IRQ>
       [<ffffffffa0205270>] ? do_output.isra.29+0x1b0/0x1b0 [openvswitch]
       [<ffffffffa02167a7>] ovs_fragment+0xcc/0x214 [openvswitch]
       [<ffffffff81667830>] ? dst_discard_out+0x20/0x20
       [<ffffffff81667810>] ? dst_ifdown+0x80/0x80
       [<ffffffffa0212072>] ? find_bucket.isra.2+0x62/0x70 [openvswitch]
       [<ffffffff810e0ba5>] ? mod_timer_pending+0x65/0x210
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffffa03205a2>] ? nf_conntrack_in+0x252/0x500 [nf_conntrack]
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffffa02051a3>] do_output.isra.29+0xe3/0x1b0 [openvswitch]
       [<ffffffffa0206411>] do_execute_actions+0xe11/0x11f0 [openvswitch]
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffffa0206822>] ovs_execute_actions+0x32/0xd0 [openvswitch]
       [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffffa02068a2>] ovs_execute_actions+0xb2/0xd0 [openvswitch]
       [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
       [<ffffffffa0215019>] ? ovs_ct_get_labels+0x49/0x80 [openvswitch]
       [<ffffffffa0213a1d>] ovs_vport_receive+0x5d/0xa0 [openvswitch]
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
       [<ffffffffa02148fc>] internal_dev_xmit+0x6c/0x140 [openvswitch]
       [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
       [<ffffffff81660299>] dev_hard_start_xmit+0x2b9/0x5e0
       [<ffffffff8165fc21>] ? netif_skb_features+0xd1/0x1f0
       [<ffffffff81660f20>] __dev_queue_xmit+0x800/0x930
       [<ffffffff81660770>] ? __dev_queue_xmit+0x50/0x930
       [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
       [<ffffffff81669876>] ? neigh_resolve_output+0x106/0x220
       [<ffffffff81661060>] dev_queue_xmit+0x10/0x20
       [<ffffffff816698e8>] neigh_resolve_output+0x178/0x220
       [<ffffffff816a8e6f>] ? ip_finish_output2+0x1ff/0x590
       [<ffffffff816a8e6f>] ip_finish_output2+0x1ff/0x590
       [<ffffffff816a8cee>] ? ip_finish_output2+0x7e/0x590
       [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
       [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
       [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
       [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
       [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
       [<ffffffff816ab4c0>] ip_output+0x70/0x110
       [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
       [<ffffffff816aa9f9>] ip_local_out+0x39/0x70
       [<ffffffff816abf89>] ip_send_skb+0x19/0x40
       [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
       [<ffffffff816df21a>] icmp_push_reply+0xea/0x120
       [<ffffffff816df93d>] icmp_reply.constprop.23+0x1ed/0x230
       [<ffffffff816df9ce>] icmp_echo.part.21+0x4e/0x50
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffff810d5f9e>] ? rcu_read_lock_held+0x5e/0x70
       [<ffffffff816dfa06>] icmp_echo+0x36/0x70
       [<ffffffff816e0d11>] icmp_rcv+0x271/0x450
       [<ffffffff816a4ca7>] ip_local_deliver_finish+0x127/0x3a0
       [<ffffffff816a4bc1>] ? ip_local_deliver_finish+0x41/0x3a0
       [<ffffffff816a5160>] ip_local_deliver+0x60/0xd0
       [<ffffffff816a4b80>] ? ip_rcv_finish+0x560/0x560
       [<ffffffff816a46fd>] ip_rcv_finish+0xdd/0x560
       [<ffffffff816a5453>] ip_rcv+0x283/0x3e0
       [<ffffffff810b6302>] ? match_held_lock+0x192/0x200
       [<ffffffff816a4620>] ? inet_del_offload+0x40/0x40
       [<ffffffff8165d062>] __netif_receive_skb_core+0x392/0xae0
       [<ffffffff8165e68e>] ? process_backlog+0x8e/0x230
       [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
       [<ffffffff8165d7c8>] __netif_receive_skb+0x18/0x60
       [<ffffffff8165e678>] process_backlog+0x78/0x230
       [<ffffffff8165e6dd>] ? process_backlog+0xdd/0x230
       [<ffffffff8165e355>] net_rx_action+0x155/0x400
       [<ffffffff8106b48c>] __do_softirq+0xcc/0x420
       [<ffffffff816a8e87>] ? ip_finish_output2+0x217/0x590
       [<ffffffff8178e78c>] do_softirq_own_stack+0x1c/0x30
       <EOI>
       [<ffffffff8106b88e>] do_softirq+0x4e/0x60
       [<ffffffff8106b948>] __local_bh_enable_ip+0xa8/0xb0
       [<ffffffff816a8eb0>] ip_finish_output2+0x240/0x590
       [<ffffffff816a9a31>] ? ip_do_fragment+0x831/0x8a0
       [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
       [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
       [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
       [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
       [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
       [<ffffffff816ab4c0>] ip_output+0x70/0x110
       [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
       [<ffffffff816aa9f9>] ip_local_out+0x39/0x70
       [<ffffffff816abf89>] ip_send_skb+0x19/0x40
       [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
       [<ffffffff816d55d3>] raw_sendmsg+0x7d3/0xc30
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffff816e7557>] ? inet_sendmsg+0xc7/0x1d0
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffff816e759a>] inet_sendmsg+0x10a/0x1d0
       [<ffffffff816e7495>] ? inet_sendmsg+0x5/0x1d0
       [<ffffffff8163e398>] sock_sendmsg+0x38/0x50
       [<ffffffff8163ec5f>] ___sys_sendmsg+0x25f/0x270
       [<ffffffff811aadad>] ? handle_mm_fault+0x8dd/0x1320
       [<ffffffff8178c147>] ? _raw_spin_unlock+0x27/0x40
       [<ffffffff810529b2>] ? __do_page_fault+0x1e2/0x460
       [<ffffffff81204886>] ? __fget_light+0x66/0x90
       [<ffffffff8163f8e2>] __sys_sendmsg+0x42/0x80
       [<ffffffff8163f932>] SyS_sendmsg+0x12/0x20
       [<ffffffff8178cb17>] entry_SYSCALL_64_fastpath+0x12/0x6f
      Code: 00 00 44 89 e0 e9 7c fb ff ff 4c 89 ff e8 e7 e7 ff ff 41 8b 9d 80 00 00 00 2b 5d d4 89 d8 c1 f8 03 0f b7 c0 e9 33 ff ff f
       66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48
      RIP  [<ffffffff816a9a92>] ip_do_fragment+0x892/0x8a0
       RSP <ffff88006d603170>
      
      Fixes: 7f8a436e ("openvswitch: Add conntrack action")
      Signed-off-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8282f274
  6. 26 1月, 2016 1 次提交
  7. 23 1月, 2016 1 次提交
  8. 22 1月, 2016 1 次提交
    • E
      tcp: fix NULL deref in tcp_v4_send_ack() · e62a123b
      Eric Dumazet 提交于
      Neal reported crashes with this stack trace :
      
       RIP: 0010:[<ffffffff8c57231b>] tcp_v4_send_ack+0x41/0x20f
      ...
       CR2: 0000000000000018 CR3: 000000044005c000 CR4: 00000000001427e0
      ...
        [<ffffffff8c57258e>] tcp_v4_reqsk_send_ack+0xa5/0xb4
        [<ffffffff8c1a7caa>] tcp_check_req+0x2ea/0x3e0
        [<ffffffff8c19e420>] tcp_rcv_state_process+0x850/0x2500
        [<ffffffff8c1a6d21>] tcp_v4_do_rcv+0x141/0x330
        [<ffffffff8c56cdb2>] sk_backlog_rcv+0x21/0x30
        [<ffffffff8c098bbd>] tcp_recvmsg+0x75d/0xf90
        [<ffffffff8c0a8700>] inet_recvmsg+0x80/0xa0
        [<ffffffff8c17623e>] sock_aio_read+0xee/0x110
        [<ffffffff8c066fcf>] do_sync_read+0x6f/0xa0
        [<ffffffff8c0673a1>] SyS_read+0x1e1/0x290
        [<ffffffff8c5ca262>] system_call_fastpath+0x16/0x1b
      
      The problem here is the skb we provide to tcp_v4_send_ack() had to
      be parked in the backlog of a new TCP fastopen child because this child
      was owned by the user at the time an out of window packet arrived.
      
      Before queuing a packet, TCP has to set skb->dev to NULL as the device
      could disappear before packet is removed from the queue.
      
      Fix this issue by using the net pointer provided by the socket (being a
      timewait or a request socket).
      
      IPv6 is immune to the bug : tcp_v6_send_response() already gets the net
      pointer from the socket if provided.
      
      Fixes: 168a8f58 ("tcp: TCP Fast Open Server - main code path")
      Reported-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jerry Chu <hkchu@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e62a123b
  9. 21 1月, 2016 4 次提交
  10. 20 1月, 2016 1 次提交
    • E
      udp: fix potential infinite loop in SO_REUSEPORT logic · ed0dfffd
      Eric Dumazet 提交于
      Using a combination of connected and un-connected sockets, Dmitry
      was able to trigger soft lockups with his fuzzer.
      
      The problem is that sockets in the SO_REUSEPORT array might have
      different scores.
      
      Right after sk2=socket(), setsockopt(sk2,...,SO_REUSEPORT, on) and
      bind(sk2, ...), but _before_ the connect(sk2) is done, sk2 is added into
      the soreuseport array, with a score which is smaller than the score of
      first socket sk1 found in hash table (I am speaking of the regular UDP
      hash table), if sk1 had the connect() done, giving a +8 to its score.
      
      hash bucket [X] -> sk1 -> sk2 -> NULL
      
      sk1 score = 14  (because it did a connect())
      sk2 score = 6
      
      SO_REUSEPORT fast selection is an optimization. If it turns out the
      score of the selected socket does not match score of first socket, just
      fallback to old SO_REUSEPORT logic instead of trying to be too smart.
      
      Normal SO_REUSEPORT users do not mix different kind of sockets, as this
      mechanism is used for load balance traffic.
      
      Fixes: e32ea7e7 ("soreuseport: fast reuseport UDP socket selection")
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Craig Gallek <kraigatgoog@gmail.com>
      Acked-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed0dfffd
  11. 16 1月, 2016 1 次提交
  12. 15 1月, 2016 8 次提交
  13. 12 1月, 2016 2 次提交
  14. 11 1月, 2016 1 次提交