1. 29 1月, 2016 6 次提交
    • I
      switchdev: Require RTNL mutex to be held when sending FDB notifications · 4f2c6ae5
      Ido Schimmel 提交于
      When switchdev drivers process FDB notifications from the underlying
      device they resolve the netdev to which the entry points to and notify
      the bridge using the switchdev notifier.
      
      However, since the RTNL mutex is not held there is nothing preventing
      the netdev from disappearing in the middle, which will cause
      br_switchdev_event() to dereference a non-existing netdev.
      
      Make switchdev drivers hold the lock at the beginning of the
      notification processing session and release it once it ends, after
      notifying the bridge.
      
      Also, remove switchdev_mutex and fdb_lock, as they are no longer needed
      when RTNL mutex is held.
      
      Fixes: 03bf0c28 ("switchdev: introduce switchdev notifier")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f2c6ae5
    • N
      tcp: fix tcp_mark_head_lost to check skb len before fragmenting · d88270ee
      Neal Cardwell 提交于
      This commit fixes a corner case in tcp_mark_head_lost() which was
      causing the WARN_ON(len > skb->len) in tcp_fragment() to fire.
      
      tcp_mark_head_lost() was assuming that if a packet has
      tcp_skb_pcount(skb) of N, then it's safe to fragment off a prefix of
      M*mss bytes, for any M < N. But with the tricky way TCP pcounts are
      maintained, this is not always true.
      
      For example, suppose the sender sends 4 1-byte packets and have the
      last 3 packet sacked. It will merge the last 3 packets in the write
      queue into an skb with pcount = 3 and len = 3 bytes. If another
      recovery happens after a sack reneging event, tcp_mark_head_lost()
      may attempt to split the skb assuming it has more than 2*MSS bytes.
      
      This sounds very counterintuitive, but as the commit description for
      the related commit c0638c24 ("tcp: don't fragment SACKed skbs in
      tcp_mark_head_lost()") notes, this is because tcp_shifted_skb()
      coalesces adjacent regions of SACKed skbs, and when doing this it
      preserves the sum of their packet counts in order to reflect the
      real-world dynamics on the wire. The c0638c24 commit tried to
      avoid problems by not fragmenting SACKed skbs, since SACKed skbs are
      where the non-proportionality between pcount and skb->len/mss is known
      to be possible. However, that commit did not handle the case where
      during a reneging event one of these weird SACKed skbs becomes an
      un-SACKed skb, which tcp_mark_head_lost() can then try to fragment.
      
      The fix is to simply mark the entire skb lost when this happens.
      This makes the recovery slightly more aggressive in such corner
      cases before we detect reordering. But once we detect reordering
      this code path is by-passed because FACK is disabled.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d88270ee
    • J
      inet: frag: Always orphan skbs inside ip_defrag() · 8282f274
      Joe Stringer 提交于
      Later parts of the stack (including fragmentation) expect that there is
      never a socket attached to frag in a frag_list, however this invariant
      was not enforced on all defrag paths. This could lead to the
      BUG_ON(skb->sk) during ip_do_fragment(), as per the call stack at the
      end of this commit message.
      
      While the call could be added to openvswitch to fix this particular
      error, the head and tail of the frags list are already orphaned
      indirectly inside ip_defrag(), so it seems like the remaining fragments
      should all be orphaned in all circumstances.
      
      kernel BUG at net/ipv4/ip_output.c:586!
      [...]
      Call Trace:
       <IRQ>
       [<ffffffffa0205270>] ? do_output.isra.29+0x1b0/0x1b0 [openvswitch]
       [<ffffffffa02167a7>] ovs_fragment+0xcc/0x214 [openvswitch]
       [<ffffffff81667830>] ? dst_discard_out+0x20/0x20
       [<ffffffff81667810>] ? dst_ifdown+0x80/0x80
       [<ffffffffa0212072>] ? find_bucket.isra.2+0x62/0x70 [openvswitch]
       [<ffffffff810e0ba5>] ? mod_timer_pending+0x65/0x210
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffffa03205a2>] ? nf_conntrack_in+0x252/0x500 [nf_conntrack]
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffffa02051a3>] do_output.isra.29+0xe3/0x1b0 [openvswitch]
       [<ffffffffa0206411>] do_execute_actions+0xe11/0x11f0 [openvswitch]
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffffa0206822>] ovs_execute_actions+0x32/0xd0 [openvswitch]
       [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffffa02068a2>] ovs_execute_actions+0xb2/0xd0 [openvswitch]
       [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
       [<ffffffffa0215019>] ? ovs_ct_get_labels+0x49/0x80 [openvswitch]
       [<ffffffffa0213a1d>] ovs_vport_receive+0x5d/0xa0 [openvswitch]
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
       [<ffffffffa02148fc>] internal_dev_xmit+0x6c/0x140 [openvswitch]
       [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
       [<ffffffff81660299>] dev_hard_start_xmit+0x2b9/0x5e0
       [<ffffffff8165fc21>] ? netif_skb_features+0xd1/0x1f0
       [<ffffffff81660f20>] __dev_queue_xmit+0x800/0x930
       [<ffffffff81660770>] ? __dev_queue_xmit+0x50/0x930
       [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
       [<ffffffff81669876>] ? neigh_resolve_output+0x106/0x220
       [<ffffffff81661060>] dev_queue_xmit+0x10/0x20
       [<ffffffff816698e8>] neigh_resolve_output+0x178/0x220
       [<ffffffff816a8e6f>] ? ip_finish_output2+0x1ff/0x590
       [<ffffffff816a8e6f>] ip_finish_output2+0x1ff/0x590
       [<ffffffff816a8cee>] ? ip_finish_output2+0x7e/0x590
       [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
       [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
       [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
       [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
       [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
       [<ffffffff816ab4c0>] ip_output+0x70/0x110
       [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
       [<ffffffff816aa9f9>] ip_local_out+0x39/0x70
       [<ffffffff816abf89>] ip_send_skb+0x19/0x40
       [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
       [<ffffffff816df21a>] icmp_push_reply+0xea/0x120
       [<ffffffff816df93d>] icmp_reply.constprop.23+0x1ed/0x230
       [<ffffffff816df9ce>] icmp_echo.part.21+0x4e/0x50
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffff810d5f9e>] ? rcu_read_lock_held+0x5e/0x70
       [<ffffffff816dfa06>] icmp_echo+0x36/0x70
       [<ffffffff816e0d11>] icmp_rcv+0x271/0x450
       [<ffffffff816a4ca7>] ip_local_deliver_finish+0x127/0x3a0
       [<ffffffff816a4bc1>] ? ip_local_deliver_finish+0x41/0x3a0
       [<ffffffff816a5160>] ip_local_deliver+0x60/0xd0
       [<ffffffff816a4b80>] ? ip_rcv_finish+0x560/0x560
       [<ffffffff816a46fd>] ip_rcv_finish+0xdd/0x560
       [<ffffffff816a5453>] ip_rcv+0x283/0x3e0
       [<ffffffff810b6302>] ? match_held_lock+0x192/0x200
       [<ffffffff816a4620>] ? inet_del_offload+0x40/0x40
       [<ffffffff8165d062>] __netif_receive_skb_core+0x392/0xae0
       [<ffffffff8165e68e>] ? process_backlog+0x8e/0x230
       [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
       [<ffffffff8165d7c8>] __netif_receive_skb+0x18/0x60
       [<ffffffff8165e678>] process_backlog+0x78/0x230
       [<ffffffff8165e6dd>] ? process_backlog+0xdd/0x230
       [<ffffffff8165e355>] net_rx_action+0x155/0x400
       [<ffffffff8106b48c>] __do_softirq+0xcc/0x420
       [<ffffffff816a8e87>] ? ip_finish_output2+0x217/0x590
       [<ffffffff8178e78c>] do_softirq_own_stack+0x1c/0x30
       <EOI>
       [<ffffffff8106b88e>] do_softirq+0x4e/0x60
       [<ffffffff8106b948>] __local_bh_enable_ip+0xa8/0xb0
       [<ffffffff816a8eb0>] ip_finish_output2+0x240/0x590
       [<ffffffff816a9a31>] ? ip_do_fragment+0x831/0x8a0
       [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
       [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
       [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
       [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
       [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
       [<ffffffff816ab4c0>] ip_output+0x70/0x110
       [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
       [<ffffffff816aa9f9>] ip_local_out+0x39/0x70
       [<ffffffff816abf89>] ip_send_skb+0x19/0x40
       [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
       [<ffffffff816d55d3>] raw_sendmsg+0x7d3/0xc30
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffff816e7557>] ? inet_sendmsg+0xc7/0x1d0
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffff816e759a>] inet_sendmsg+0x10a/0x1d0
       [<ffffffff816e7495>] ? inet_sendmsg+0x5/0x1d0
       [<ffffffff8163e398>] sock_sendmsg+0x38/0x50
       [<ffffffff8163ec5f>] ___sys_sendmsg+0x25f/0x270
       [<ffffffff811aadad>] ? handle_mm_fault+0x8dd/0x1320
       [<ffffffff8178c147>] ? _raw_spin_unlock+0x27/0x40
       [<ffffffff810529b2>] ? __do_page_fault+0x1e2/0x460
       [<ffffffff81204886>] ? __fget_light+0x66/0x90
       [<ffffffff8163f8e2>] __sys_sendmsg+0x42/0x80
       [<ffffffff8163f932>] SyS_sendmsg+0x12/0x20
       [<ffffffff8178cb17>] entry_SYSCALL_64_fastpath+0x12/0x6f
      Code: 00 00 44 89 e0 e9 7c fb ff ff 4c 89 ff e8 e7 e7 ff ff 41 8b 9d 80 00 00 00 2b 5d d4 89 d8 c1 f8 03 0f b7 c0 e9 33 ff ff f
       66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48
      RIP  [<ffffffff816a9a92>] ip_do_fragment+0x892/0x8a0
       RSP <ffff88006d603170>
      
      Fixes: 7f8a436e ("openvswitch: Add conntrack action")
      Signed-off-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8282f274
    • X
      sctp: remove the dead field of sctp_transport · 47faa1e4
      Xin Long 提交于
      After we use refcnt to check if transport is alive, the dead can be
      removed from sctp_transport.
      
      The traversal of transport_addr_list in procfs dump is using
      list_for_each_entry_rcu, no need to check if it has been freed.
      
      sctp_generate_t3_rtx_event and sctp_generate_heartbeat_event is
      protected by sock lock, it's not necessary to check dead, either.
      also, the timers are cancelled when sctp_transport_free() is
      called, that it doesn't wait for refcnt to reach 0 to cancel them.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47faa1e4
    • X
      sctp: hold transport before we access t->asoc in sctp proc · fba4c330
      Xin Long 提交于
      Previously, before rhashtable, /proc assoc listing was done by
      read-locking the entire hash entry and dumping all assocs at once, so we
      were sure that the assoc wasn't freed because it wouldn't be possible to
      remove it from the hash meanwhile.
      
      Now we use rhashtable to list transports, and dump entries one by one.
      That is, now we have to check if the assoc is still a good one, as the
      transport we got may be being freed.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fba4c330
    • X
      sctp: fix the transport dead race check by using atomic_add_unless on refcnt · 1eed6779
      Xin Long 提交于
      Now when __sctp_lookup_association is running in BH, it will try to
      check if t->dead is set, but meanwhile other CPUs may be freeing this
      transport and this assoc and if it happens that
      __sctp_lookup_association checked t->dead a bit too early, it may think
      that the association is still good while it was already freed.
      
      So we fix this race by using atomic_add_unless in sctp_transport_hold.
      After we get one transport from hashtable, we will hold it only when
      this transport's refcnt is not 0, so that we can make sure t->asoc
      cannot be freed before we hold the asoc again.
      
      Note that sctp association is not freed using RCU so we can't use
      atomic_add_unless() with it as it may just be too late for that either.
      
      Fixes: 4f008781 ("sctp: apply rhashtable api to send/recv path")
      Reported-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1eed6779
  2. 26 1月, 2016 4 次提交
  3. 25 1月, 2016 2 次提交
  4. 22 1月, 2016 1 次提交
    • E
      tcp: fix NULL deref in tcp_v4_send_ack() · e62a123b
      Eric Dumazet 提交于
      Neal reported crashes with this stack trace :
      
       RIP: 0010:[<ffffffff8c57231b>] tcp_v4_send_ack+0x41/0x20f
      ...
       CR2: 0000000000000018 CR3: 000000044005c000 CR4: 00000000001427e0
      ...
        [<ffffffff8c57258e>] tcp_v4_reqsk_send_ack+0xa5/0xb4
        [<ffffffff8c1a7caa>] tcp_check_req+0x2ea/0x3e0
        [<ffffffff8c19e420>] tcp_rcv_state_process+0x850/0x2500
        [<ffffffff8c1a6d21>] tcp_v4_do_rcv+0x141/0x330
        [<ffffffff8c56cdb2>] sk_backlog_rcv+0x21/0x30
        [<ffffffff8c098bbd>] tcp_recvmsg+0x75d/0xf90
        [<ffffffff8c0a8700>] inet_recvmsg+0x80/0xa0
        [<ffffffff8c17623e>] sock_aio_read+0xee/0x110
        [<ffffffff8c066fcf>] do_sync_read+0x6f/0xa0
        [<ffffffff8c0673a1>] SyS_read+0x1e1/0x290
        [<ffffffff8c5ca262>] system_call_fastpath+0x16/0x1b
      
      The problem here is the skb we provide to tcp_v4_send_ack() had to
      be parked in the backlog of a new TCP fastopen child because this child
      was owned by the user at the time an out of window packet arrived.
      
      Before queuing a packet, TCP has to set skb->dev to NULL as the device
      could disappear before packet is removed from the queue.
      
      Fix this issue by using the net pointer provided by the socket (being a
      timewait or a request socket).
      
      IPv6 is immune to the bug : tcp_v6_send_response() already gets the net
      pointer from the socket if provided.
      
      Fixes: 168a8f58 ("tcp: TCP Fast Open Server - main code path")
      Reported-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jerry Chu <hkchu@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e62a123b
  5. 21 1月, 2016 2 次提交
  6. 20 1月, 2016 4 次提交
  7. 19 1月, 2016 1 次提交
    • H
      ovs: limit ovs recursions in ovs_execute_actions to not corrupt stack · b064d0d8
      Hannes Frederic Sowa 提交于
      It was seen that defective configurations of openvswitch could overwrite
      the STACK_END_MAGIC and cause a hard crash of the kernel because of too
      many recursions within ovs.
      
      This problem arises due to the high stack usage of openvswitch. The rest
      of the kernel is fine with the current limit of 10 (RECURSION_LIMIT).
      
      We use the already existing recursion counter in ovs_execute_actions to
      implement an upper bound of 5 recursions.
      
      Cc: Pravin Shelar <pshelar@ovn.org>
      Cc: Simon Horman <simon.horman@netronome.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Simon Horman <simon.horman@netronome.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b064d0d8
  8. 18 1月, 2016 3 次提交
  9. 16 1月, 2016 13 次提交
    • S
      batman-adv: Drop immediate orig_node free function · 42eff6a6
      Sven Eckelmann 提交于
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_orig_node_free_ref.
      
      Fixes: 72822225 ("batman-adv: Fix rcu_barrier() miss due to double call_rcu() in TT code")
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: NAntonio Quartulli <a@unstable.cc>
      42eff6a6
    • S
      batman-adv: Drop immediate batadv_hard_iface free function · b4d922cf
      Sven Eckelmann 提交于
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_hardif_free_ref.
      
      Fixes: 89652331 ("batman-adv: split tq information in neigh_node struct")
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: NAntonio Quartulli <a@unstable.cc>
      b4d922cf
    • S
      batman-adv: Drop immediate neigh_ifinfo free function · ae3e1e36
      Sven Eckelmann 提交于
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_neigh_ifinfo_free_ref.
      
      Fixes: 89652331 ("batman-adv: split tq information in neigh_node struct")
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: NAntonio Quartulli <a@unstable.cc>
      ae3e1e36
    • S
      batman-adv: Drop immediate batadv_hardif_neigh_node free function · f6389692
      Sven Eckelmann 提交于
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_hardif_neigh_free_ref.
      
      Fixes: cef63419 ("batman-adv: add list of unique single hop neighbors per hard-interface")
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: NAntonio Quartulli <a@unstable.cc>
      f6389692
    • S
      batman-adv: Drop immediate batadv_neigh_node free function · 2baa753c
      Sven Eckelmann 提交于
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_neigh_node_free_ref.
      
      Fixes: 89652331 ("batman-adv: split tq information in neigh_node struct")
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: NAntonio Quartulli <a@unstable.cc>
      2baa753c
    • S
      batman-adv: Drop immediate batadv_orig_ifinfo free function · deed9660
      Sven Eckelmann 提交于
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_orig_ifinfo_free_ref.
      
      Fixes: 7351a482 ("batman-adv: split out router from orig_node")
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: NAntonio Quartulli <a@unstable.cc>
      deed9660
    • S
      batman-adv: Avoid recursive call_rcu for batadv_nc_node · 44e8e7e9
      Sven Eckelmann 提交于
      The batadv_nc_node_free_ref function uses call_rcu to delay the free of the
      batadv_nc_node object until no (already started) rcu_read_lock is enabled
      anymore. This makes sure that no context is still trying to access the
      object which should be removed. But batadv_nc_node also contains a
      reference to orig_node which must be removed.
      
      The reference drop of orig_node was done in the call_rcu function
      batadv_nc_node_free_rcu but should actually be done in the
      batadv_nc_node_release function to avoid nested call_rcus. This is
      important because rcu_barrier (e.g. batadv_softif_free or batadv_exit) will
      not detect the inner call_rcu as relevant for its execution. Otherwise this
      barrier will most likely be inserted in the queue before the callback of
      the first call_rcu was executed. The caller of rcu_barrier will therefore
      continue to run before the inner call_rcu callback finished.
      
      Fixes: d56b1705 ("batman-adv: network coding - detect coding nodes and remove these after timeout")
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: NAntonio Quartulli <a@unstable.cc>
      44e8e7e9
    • S
      batman-adv: Avoid recursive call_rcu for batadv_bla_claim · 63b39927
      Sven Eckelmann 提交于
      The batadv_claim_free_ref function uses call_rcu to delay the free of the
      batadv_bla_claim object until no (already started) rcu_read_lock is enabled
      anymore. This makes sure that no context is still trying to access the
      object which should be removed. But batadv_bla_claim also contains a
      reference to backbone_gw which must be removed.
      
      The reference drop of backbone_gw was done in the call_rcu function
      batadv_claim_free_rcu but should actually be done in the
      batadv_claim_release function to avoid nested call_rcus. This is important
      because rcu_barrier (e.g. batadv_softif_free or batadv_exit) will not
      detect the inner call_rcu as relevant for its execution. Otherwise this
      barrier will most likely be inserted in the queue before the callback of
      the first call_rcu was executed. The caller of rcu_barrier will therefore
      continue to run before the inner call_rcu callback finished.
      
      Fixes: 23721387 ("batman-adv: add basic bridge loop avoidance code")
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Acked-by: NSimon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: NAntonio Quartulli <a@unstable.cc>
      63b39927
    • N
      bridge: fix lockdep addr_list_lock false positive splat · c6894dec
      Nikolay Aleksandrov 提交于
      After promisc mode management was introduced a bridge device could do
      dev_set_promiscuity from its ndo_change_rx_flags() callback which in
      turn can be called after the bridge's addr_list_lock has been taken
      (e.g. by dev_uc_add). This causes a false positive lockdep splat because
      the port interfaces' addr_list_lock is taken when br_manage_promisc()
      runs after the bridge's addr list lock was already taken.
      To remove the false positive introduce a custom bridge addr_list_lock
      class and set it on bridge init.
      A simple way to reproduce this is with the following:
      $ brctl addbr br0
      $ ip l add l br0 br0.100 type vlan id 100
      $ ip l set br0 up
      $ ip l set br0.100 up
      $ echo 1 > /sys/class/net/br0/bridge/vlan_filtering
      $ brctl addif br0 eth0
      Splat:
      [   43.684325] =============================================
      [   43.684485] [ INFO: possible recursive locking detected ]
      [   43.684636] 4.4.0-rc8+ #54 Not tainted
      [   43.684755] ---------------------------------------------
      [   43.684906] brctl/1187 is trying to acquire lock:
      [   43.685047]  (_xmit_ETHER){+.....}, at: [<ffffffff8150169e>] dev_set_rx_mode+0x1e/0x40
      [   43.685460]  but task is already holding lock:
      [   43.685618]  (_xmit_ETHER){+.....}, at: [<ffffffff815072a7>] dev_uc_add+0x27/0x80
      [   43.686015]  other info that might help us debug this:
      [   43.686316]  Possible unsafe locking scenario:
      
      [   43.686743]        CPU0
      [   43.686967]        ----
      [   43.687197]   lock(_xmit_ETHER);
      [   43.687544]   lock(_xmit_ETHER);
      [   43.687886] *** DEADLOCK ***
      
      [   43.688438]  May be due to missing lock nesting notation
      
      [   43.688882] 2 locks held by brctl/1187:
      [   43.689134]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81510317>] rtnl_lock+0x17/0x20
      [   43.689852]  #1:  (_xmit_ETHER){+.....}, at: [<ffffffff815072a7>] dev_uc_add+0x27/0x80
      [   43.690575] stack backtrace:
      [   43.690970] CPU: 0 PID: 1187 Comm: brctl Not tainted 4.4.0-rc8+ #54
      [   43.691270] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
      [   43.691770]  ffffffff826a25c0 ffff8800369fb8e0 ffffffff81360ceb ffffffff826a25c0
      [   43.692425]  ffff8800369fb9b8 ffffffff810d0466 ffff8800369fb968 ffffffff81537139
      [   43.693071]  ffff88003a08c880 0000000000000000 00000000ffffffff 0000000002080020
      [   43.693709] Call Trace:
      [   43.693931]  [<ffffffff81360ceb>] dump_stack+0x4b/0x70
      [   43.694199]  [<ffffffff810d0466>] __lock_acquire+0x1e46/0x1e90
      [   43.694483]  [<ffffffff81537139>] ? netlink_broadcast_filtered+0x139/0x3e0
      [   43.694789]  [<ffffffff8153b5da>] ? nlmsg_notify+0x5a/0xc0
      [   43.695064]  [<ffffffff810d10f5>] lock_acquire+0xe5/0x1f0
      [   43.695340]  [<ffffffff8150169e>] ? dev_set_rx_mode+0x1e/0x40
      [   43.695623]  [<ffffffff815edea5>] _raw_spin_lock_bh+0x45/0x80
      [   43.695901]  [<ffffffff8150169e>] ? dev_set_rx_mode+0x1e/0x40
      [   43.696180]  [<ffffffff8150169e>] dev_set_rx_mode+0x1e/0x40
      [   43.696460]  [<ffffffff8150189c>] dev_set_promiscuity+0x3c/0x50
      [   43.696750]  [<ffffffffa0586845>] br_port_set_promisc+0x25/0x50 [bridge]
      [   43.697052]  [<ffffffffa05869aa>] br_manage_promisc+0x8a/0xe0 [bridge]
      [   43.697348]  [<ffffffffa05826ee>] br_dev_change_rx_flags+0x1e/0x20 [bridge]
      [   43.697655]  [<ffffffff81501532>] __dev_set_promiscuity+0x132/0x1f0
      [   43.697943]  [<ffffffff81501672>] __dev_set_rx_mode+0x82/0x90
      [   43.698223]  [<ffffffff815072de>] dev_uc_add+0x5e/0x80
      [   43.698498]  [<ffffffffa05b3c62>] vlan_device_event+0x542/0x650 [8021q]
      [   43.698798]  [<ffffffff8109886d>] notifier_call_chain+0x5d/0x80
      [   43.699083]  [<ffffffff810988b6>] raw_notifier_call_chain+0x16/0x20
      [   43.699374]  [<ffffffff814f456e>] call_netdevice_notifiers_info+0x6e/0x80
      [   43.699678]  [<ffffffff814f4596>] call_netdevice_notifiers+0x16/0x20
      [   43.699973]  [<ffffffffa05872be>] br_add_if+0x47e/0x4c0 [bridge]
      [   43.700259]  [<ffffffffa058801e>] add_del_if+0x6e/0x80 [bridge]
      [   43.700548]  [<ffffffffa0588b5f>] br_dev_ioctl+0xaf/0xc0 [bridge]
      [   43.700836]  [<ffffffff8151a7ac>] dev_ifsioc+0x30c/0x3c0
      [   43.701106]  [<ffffffff8151aac9>] dev_ioctl+0xf9/0x6f0
      [   43.701379]  [<ffffffff81254345>] ? mntput_no_expire+0x5/0x450
      [   43.701665]  [<ffffffff812543ee>] ? mntput_no_expire+0xae/0x450
      [   43.701947]  [<ffffffff814d7b02>] sock_do_ioctl+0x42/0x50
      [   43.702219]  [<ffffffff814d8175>] sock_ioctl+0x1e5/0x290
      [   43.702500]  [<ffffffff81242d0b>] do_vfs_ioctl+0x2cb/0x5c0
      [   43.702771]  [<ffffffff81243079>] SyS_ioctl+0x79/0x90
      [   43.703033]  [<ffffffff815eebb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
      
      CC: Vlad Yasevich <vyasevic@redhat.com>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      CC: Bridge list <bridge@lists.linux-foundation.org>
      CC: Andy Gospodarek <gospo@cumulusnetworks.com>
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      Fixes: 2796d0c6 ("bridge: Automatically manage port promiscuous mode.")
      Reported-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6894dec
    • G
      net: sctp: Move sequence start handling into sctp_transport_get_idx() · fb331185
      Geert Uytterhoeven 提交于
      net/sctp/proc.c: In function ‘sctp_transport_get_idx’:
      net/sctp/proc.c:313: warning: ‘obj’ may be used uninitialized in this function
      
      This is currently a false positive, as all callers check for a zero
      offset first, and handle this case in the exact same way.
      
      Move the check and handling into sctp_transport_get_idx() to kill the
      compiler warning, and avoid future bugs.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb331185
    • E
      ipv6: update skb->csum when CE mark is propagated · 34ae6a1a
      Eric Dumazet 提交于
      When a tunnel decapsulates the outer header, it has to comply
      with RFC 6080 and eventually propagate CE mark into inner header.
      
      It turns out IP6_ECN_set_ce() does not correctly update skb->csum
      for CHECKSUM_COMPLETE packets, triggering infamous "hw csum failure"
      messages and stack traces.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34ae6a1a
    • X
      sctp: support to lookup with ep+paddr in transport rhashtable · 65a5124a
      Xin Long 提交于
      Now, when we sendmsg, we translate the ep to laddr by selecting the
      first element of the list, and then do a lookup for a transport.
      
      But sctp_hash_cmp() will compare it against asoc addr_list, which may
      be a subset of ep addr_list, meaning that this chosen laddr may not be
      there, and thus making it impossible to find the transport.
      
      So we fix it by using ep + paddr to lookup transports in hashtable. In
      sctp_hash_cmp, if .ep is set, we will check if this ep == asoc->ep,
      or we will do the laddr check.
      
      Fixes: d6c0256a ("sctp: add the rhashtable apis for sctp global transport hashtable")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reported-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65a5124a
    • K
      net: preserve IP control block during GSO segmentation · 9207f9d4
      Konstantin Khlebnikov 提交于
      Skb_gso_segment() uses skb control block during segmentation.
      This patch adds 32-bytes room for previous control block which
      will be copied into all resulting segments.
      
      This patch fixes kernel crash during fragmenting forwarded packets.
      Fragmentation requires valid IP CB in skb for clearing ip options.
      Also patch removes custom save/restore in ovs code, now it's redundant.
      Signed-off-by: NKonstantin Khlebnikov <koct9i@gmail.com>
      Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.comSigned-off-by: NDavid S. Miller <davem@davemloft.net>
      9207f9d4
  10. 15 1月, 2016 4 次提交