1. 25 2月, 2016 2 次提交
  2. 10 2月, 2016 1 次提交
    • D
      vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices · 7e059158
      David Wragg 提交于
      Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could
      transmit vxlan packets of any size, constrained only by the ability to
      send out the resulting packets.  4.3 introduced netdevs corresponding
      to tunnel vports.  These netdevs have an MTU, which limits the size of
      a packet that can be successfully encapsulated.  The default MTU
      values are low (1500 or less), which is awkwardly small in the context
      of physical networks supporting jumbo frames, and leads to a
      conspicuous change in behaviour for userspace.
      
      Instead, set the MTU on openvswitch-created netdevs to be the relevant
      maximum (i.e. the maximum IP packet size minus any relevant overhead),
      effectively restoring the behaviour prior to 4.3.
      Signed-off-by: NDavid Wragg <david@weave.works>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e059158
  3. 09 2月, 2016 4 次提交
  4. 08 2月, 2016 2 次提交
  5. 06 2月, 2016 1 次提交
    • S
      ipv6: addrconf: Fix recursive spin lock call · 16186a82
      subashab@codeaurora.org 提交于
      A rcu stall with the following backtrace was seen on a system with
      forwarding, optimistic_dad and use_optimistic set. To reproduce,
      set these flags and allow ipv6 autoconf.
      
      This occurs because the device write_lock is acquired while already
      holding the read_lock. Back trace below -
      
      INFO: rcu_preempt self-detected stall on CPU { 1}  (t=2100 jiffies
       g=3992 c=3991 q=4471)
      <6> Task dump for CPU 1:
      <2> kworker/1:0     R  running task    12168    15   2 0x00000002
      <2> Workqueue: ipv6_addrconf addrconf_dad_work
      <6> Call trace:
      <2> [<ffffffc000084da8>] el1_irq+0x68/0xdc
      <2> [<ffffffc000cc4e0c>] _raw_write_lock_bh+0x20/0x30
      <2> [<ffffffc000bc5dd8>] __ipv6_dev_ac_inc+0x64/0x1b4
      <2> [<ffffffc000bcbd2c>] addrconf_join_anycast+0x9c/0xc4
      <2> [<ffffffc000bcf9f0>] __ipv6_ifa_notify+0x160/0x29c
      <2> [<ffffffc000bcfb7c>] ipv6_ifa_notify+0x50/0x70
      <2> [<ffffffc000bd035c>] addrconf_dad_work+0x314/0x334
      <2> [<ffffffc0000b64c8>] process_one_work+0x244/0x3fc
      <2> [<ffffffc0000b7324>] worker_thread+0x2f8/0x418
      <2> [<ffffffc0000bb40c>] kthread+0xe0/0xec
      
      v2: do addrconf_dad_kick inside read lock and then acquire write
      lock for ipv6_ifa_notify as suggested by Eric
      
      Fixes: 7fd2561e ("net: ipv6: Add a sysctl to make optimistic
      addresses useful candidates")
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Erik Kline <ek@google.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16186a82
  6. 05 2月, 2016 5 次提交
  7. 30 1月, 2016 10 次提交
  8. 29 1月, 2016 11 次提交
    • J
      Bluetooth: Fix incorrect removing of IRKs · cff10ce7
      Johan Hedberg 提交于
      The commit cad20c27 was supposed to
      fix handling of devices first using public addresses and then
      switching to RPAs after pairing. Unfortunately it missed a couple of
      key places in the code.
      
      1. When evaluating which devices should be removed from the existing
      white list we also need to consider whether we have an IRK for them or
      not, i.e. a call to hci_find_irk_by_addr() is needed.
      
      2. In smp_notify_keys() we should not be requiring the knowledge of
      the RPA, but should simply keep the IRK around if the other conditions
      require it.
      Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      Cc: stable@vger.kernel.org # 4.4+
      cff10ce7
    • J
      Bluetooth: L2CAP: Fix setting chan src info before adding PSM/CID · a2342c5f
      Johan Hedberg 提交于
      At least the l2cap_add_psm() routine depends on the source address
      type being properly set to know what auto-allocation ranges to use, so
      the assignment to l2cap_chan needs to happen before this.
      Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      a2342c5f
    • J
      Bluetooth: L2CAP: Fix auto-allocating LE PSM values · 92594a51
      Johan Hedberg 提交于
      The LE dynamic PSM range is different from BR/EDR (0x0080 - 0x00ff)
      and doesn't have requirements relating to parity, so separate checks
      are needed.
      Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      92594a51
    • J
      Bluetooth: L2CAP: Introduce proper defines for PSM ranges · 114f9f1e
      Johan Hedberg 提交于
      Having proper defines makes the code a bit readable, it also avoids
      duplicating hard-coded values since these are also needed when
      auto-allocating PSM values (in a subsequent patch).
      Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      114f9f1e
    • E
      tcp: beware of alignments in tcp_get_info() · ff5d7497
      Eric Dumazet 提交于
      With some combinations of user provided flags in netlink command,
      it is possible to call tcp_get_info() with a buffer that is not 8-bytes
      aligned.
      
      It does matter on some arches, so we need to use put_unaligned() to
      store the u64 fields.
      
      Current iproute2 package does not trigger this particular issue.
      
      Fixes: 0df48c26 ("tcp: add tcpi_bytes_acked to tcp_info")
      Fixes: 977cb0ec ("tcp: add pacing_rate information into tcp_info")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff5d7497
    • I
      switchdev: Require RTNL mutex to be held when sending FDB notifications · 4f2c6ae5
      Ido Schimmel 提交于
      When switchdev drivers process FDB notifications from the underlying
      device they resolve the netdev to which the entry points to and notify
      the bridge using the switchdev notifier.
      
      However, since the RTNL mutex is not held there is nothing preventing
      the netdev from disappearing in the middle, which will cause
      br_switchdev_event() to dereference a non-existing netdev.
      
      Make switchdev drivers hold the lock at the beginning of the
      notification processing session and release it once it ends, after
      notifying the bridge.
      
      Also, remove switchdev_mutex and fdb_lock, as they are no longer needed
      when RTNL mutex is held.
      
      Fixes: 03bf0c28 ("switchdev: introduce switchdev notifier")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f2c6ae5
    • N
      tcp: fix tcp_mark_head_lost to check skb len before fragmenting · d88270ee
      Neal Cardwell 提交于
      This commit fixes a corner case in tcp_mark_head_lost() which was
      causing the WARN_ON(len > skb->len) in tcp_fragment() to fire.
      
      tcp_mark_head_lost() was assuming that if a packet has
      tcp_skb_pcount(skb) of N, then it's safe to fragment off a prefix of
      M*mss bytes, for any M < N. But with the tricky way TCP pcounts are
      maintained, this is not always true.
      
      For example, suppose the sender sends 4 1-byte packets and have the
      last 3 packet sacked. It will merge the last 3 packets in the write
      queue into an skb with pcount = 3 and len = 3 bytes. If another
      recovery happens after a sack reneging event, tcp_mark_head_lost()
      may attempt to split the skb assuming it has more than 2*MSS bytes.
      
      This sounds very counterintuitive, but as the commit description for
      the related commit c0638c24 ("tcp: don't fragment SACKed skbs in
      tcp_mark_head_lost()") notes, this is because tcp_shifted_skb()
      coalesces adjacent regions of SACKed skbs, and when doing this it
      preserves the sum of their packet counts in order to reflect the
      real-world dynamics on the wire. The c0638c24 commit tried to
      avoid problems by not fragmenting SACKed skbs, since SACKed skbs are
      where the non-proportionality between pcount and skb->len/mss is known
      to be possible. However, that commit did not handle the case where
      during a reneging event one of these weird SACKed skbs becomes an
      un-SACKed skb, which tcp_mark_head_lost() can then try to fragment.
      
      The fix is to simply mark the entire skb lost when this happens.
      This makes the recovery slightly more aggressive in such corner
      cases before we detect reordering. But once we detect reordering
      this code path is by-passed because FACK is disabled.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d88270ee
    • J
      inet: frag: Always orphan skbs inside ip_defrag() · 8282f274
      Joe Stringer 提交于
      Later parts of the stack (including fragmentation) expect that there is
      never a socket attached to frag in a frag_list, however this invariant
      was not enforced on all defrag paths. This could lead to the
      BUG_ON(skb->sk) during ip_do_fragment(), as per the call stack at the
      end of this commit message.
      
      While the call could be added to openvswitch to fix this particular
      error, the head and tail of the frags list are already orphaned
      indirectly inside ip_defrag(), so it seems like the remaining fragments
      should all be orphaned in all circumstances.
      
      kernel BUG at net/ipv4/ip_output.c:586!
      [...]
      Call Trace:
       <IRQ>
       [<ffffffffa0205270>] ? do_output.isra.29+0x1b0/0x1b0 [openvswitch]
       [<ffffffffa02167a7>] ovs_fragment+0xcc/0x214 [openvswitch]
       [<ffffffff81667830>] ? dst_discard_out+0x20/0x20
       [<ffffffff81667810>] ? dst_ifdown+0x80/0x80
       [<ffffffffa0212072>] ? find_bucket.isra.2+0x62/0x70 [openvswitch]
       [<ffffffff810e0ba5>] ? mod_timer_pending+0x65/0x210
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffffa03205a2>] ? nf_conntrack_in+0x252/0x500 [nf_conntrack]
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffffa02051a3>] do_output.isra.29+0xe3/0x1b0 [openvswitch]
       [<ffffffffa0206411>] do_execute_actions+0xe11/0x11f0 [openvswitch]
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffffa0206822>] ovs_execute_actions+0x32/0xd0 [openvswitch]
       [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffffa02068a2>] ovs_execute_actions+0xb2/0xd0 [openvswitch]
       [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
       [<ffffffffa0215019>] ? ovs_ct_get_labels+0x49/0x80 [openvswitch]
       [<ffffffffa0213a1d>] ovs_vport_receive+0x5d/0xa0 [openvswitch]
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
       [<ffffffffa02148fc>] internal_dev_xmit+0x6c/0x140 [openvswitch]
       [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
       [<ffffffff81660299>] dev_hard_start_xmit+0x2b9/0x5e0
       [<ffffffff8165fc21>] ? netif_skb_features+0xd1/0x1f0
       [<ffffffff81660f20>] __dev_queue_xmit+0x800/0x930
       [<ffffffff81660770>] ? __dev_queue_xmit+0x50/0x930
       [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
       [<ffffffff81669876>] ? neigh_resolve_output+0x106/0x220
       [<ffffffff81661060>] dev_queue_xmit+0x10/0x20
       [<ffffffff816698e8>] neigh_resolve_output+0x178/0x220
       [<ffffffff816a8e6f>] ? ip_finish_output2+0x1ff/0x590
       [<ffffffff816a8e6f>] ip_finish_output2+0x1ff/0x590
       [<ffffffff816a8cee>] ? ip_finish_output2+0x7e/0x590
       [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
       [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
       [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
       [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
       [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
       [<ffffffff816ab4c0>] ip_output+0x70/0x110
       [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
       [<ffffffff816aa9f9>] ip_local_out+0x39/0x70
       [<ffffffff816abf89>] ip_send_skb+0x19/0x40
       [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
       [<ffffffff816df21a>] icmp_push_reply+0xea/0x120
       [<ffffffff816df93d>] icmp_reply.constprop.23+0x1ed/0x230
       [<ffffffff816df9ce>] icmp_echo.part.21+0x4e/0x50
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffff810d5f9e>] ? rcu_read_lock_held+0x5e/0x70
       [<ffffffff816dfa06>] icmp_echo+0x36/0x70
       [<ffffffff816e0d11>] icmp_rcv+0x271/0x450
       [<ffffffff816a4ca7>] ip_local_deliver_finish+0x127/0x3a0
       [<ffffffff816a4bc1>] ? ip_local_deliver_finish+0x41/0x3a0
       [<ffffffff816a5160>] ip_local_deliver+0x60/0xd0
       [<ffffffff816a4b80>] ? ip_rcv_finish+0x560/0x560
       [<ffffffff816a46fd>] ip_rcv_finish+0xdd/0x560
       [<ffffffff816a5453>] ip_rcv+0x283/0x3e0
       [<ffffffff810b6302>] ? match_held_lock+0x192/0x200
       [<ffffffff816a4620>] ? inet_del_offload+0x40/0x40
       [<ffffffff8165d062>] __netif_receive_skb_core+0x392/0xae0
       [<ffffffff8165e68e>] ? process_backlog+0x8e/0x230
       [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
       [<ffffffff8165d7c8>] __netif_receive_skb+0x18/0x60
       [<ffffffff8165e678>] process_backlog+0x78/0x230
       [<ffffffff8165e6dd>] ? process_backlog+0xdd/0x230
       [<ffffffff8165e355>] net_rx_action+0x155/0x400
       [<ffffffff8106b48c>] __do_softirq+0xcc/0x420
       [<ffffffff816a8e87>] ? ip_finish_output2+0x217/0x590
       [<ffffffff8178e78c>] do_softirq_own_stack+0x1c/0x30
       <EOI>
       [<ffffffff8106b88e>] do_softirq+0x4e/0x60
       [<ffffffff8106b948>] __local_bh_enable_ip+0xa8/0xb0
       [<ffffffff816a8eb0>] ip_finish_output2+0x240/0x590
       [<ffffffff816a9a31>] ? ip_do_fragment+0x831/0x8a0
       [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
       [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
       [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
       [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
       [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
       [<ffffffff816ab4c0>] ip_output+0x70/0x110
       [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
       [<ffffffff816aa9f9>] ip_local_out+0x39/0x70
       [<ffffffff816abf89>] ip_send_skb+0x19/0x40
       [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
       [<ffffffff816d55d3>] raw_sendmsg+0x7d3/0xc30
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffff816e7557>] ? inet_sendmsg+0xc7/0x1d0
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffff816e759a>] inet_sendmsg+0x10a/0x1d0
       [<ffffffff816e7495>] ? inet_sendmsg+0x5/0x1d0
       [<ffffffff8163e398>] sock_sendmsg+0x38/0x50
       [<ffffffff8163ec5f>] ___sys_sendmsg+0x25f/0x270
       [<ffffffff811aadad>] ? handle_mm_fault+0x8dd/0x1320
       [<ffffffff8178c147>] ? _raw_spin_unlock+0x27/0x40
       [<ffffffff810529b2>] ? __do_page_fault+0x1e2/0x460
       [<ffffffff81204886>] ? __fget_light+0x66/0x90
       [<ffffffff8163f8e2>] __sys_sendmsg+0x42/0x80
       [<ffffffff8163f932>] SyS_sendmsg+0x12/0x20
       [<ffffffff8178cb17>] entry_SYSCALL_64_fastpath+0x12/0x6f
      Code: 00 00 44 89 e0 e9 7c fb ff ff 4c 89 ff e8 e7 e7 ff ff 41 8b 9d 80 00 00 00 2b 5d d4 89 d8 c1 f8 03 0f b7 c0 e9 33 ff ff f
       66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48
      RIP  [<ffffffff816a9a92>] ip_do_fragment+0x892/0x8a0
       RSP <ffff88006d603170>
      
      Fixes: 7f8a436e ("openvswitch: Add conntrack action")
      Signed-off-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8282f274
    • X
      sctp: remove the dead field of sctp_transport · 47faa1e4
      Xin Long 提交于
      After we use refcnt to check if transport is alive, the dead can be
      removed from sctp_transport.
      
      The traversal of transport_addr_list in procfs dump is using
      list_for_each_entry_rcu, no need to check if it has been freed.
      
      sctp_generate_t3_rtx_event and sctp_generate_heartbeat_event is
      protected by sock lock, it's not necessary to check dead, either.
      also, the timers are cancelled when sctp_transport_free() is
      called, that it doesn't wait for refcnt to reach 0 to cancel them.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47faa1e4
    • X
      sctp: hold transport before we access t->asoc in sctp proc · fba4c330
      Xin Long 提交于
      Previously, before rhashtable, /proc assoc listing was done by
      read-locking the entire hash entry and dumping all assocs at once, so we
      were sure that the assoc wasn't freed because it wouldn't be possible to
      remove it from the hash meanwhile.
      
      Now we use rhashtable to list transports, and dump entries one by one.
      That is, now we have to check if the assoc is still a good one, as the
      transport we got may be being freed.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fba4c330
    • X
      sctp: fix the transport dead race check by using atomic_add_unless on refcnt · 1eed6779
      Xin Long 提交于
      Now when __sctp_lookup_association is running in BH, it will try to
      check if t->dead is set, but meanwhile other CPUs may be freeing this
      transport and this assoc and if it happens that
      __sctp_lookup_association checked t->dead a bit too early, it may think
      that the association is still good while it was already freed.
      
      So we fix this race by using atomic_add_unless in sctp_transport_hold.
      After we get one transport from hashtable, we will hold it only when
      this transport's refcnt is not 0, so that we can make sure t->asoc
      cannot be freed before we hold the asoc again.
      
      Note that sctp association is not freed using RCU so we can't use
      atomic_add_unless() with it as it may just be too late for that either.
      
      Fixes: 4f008781 ("sctp: apply rhashtable api to send/recv path")
      Reported-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1eed6779
  9. 26 1月, 2016 4 次提交