1. 23 7月, 2012 1 次提交
    • N
      sctp: Implement quick failover draft from tsvwg · 5aa93bcf
      Neil Horman 提交于
      I've seen several attempts recently made to do quick failover of sctp transports
      by reducing various retransmit timers and counters.  While its possible to
      implement a faster failover on multihomed sctp associations, its not
      particularly robust, in that it can lead to unneeded retransmits, as well as
      false connection failures due to intermittent latency on a network.
      
      Instead, lets implement the new ietf quick failover draft found here:
      http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05
      
      This will let the sctp stack identify transports that have had a small number of
      errors, and avoid using them quickly until their reliability can be
      re-established.  I've tested this out on two virt guests connected via multiple
      isolated virt networks and believe its in compliance with the above draft and
      works well.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: Sridhar Samudrala <sri@us.ibm.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: linux-sctp@vger.kernel.org
      CC: joe@perches.com
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5aa93bcf
  2. 17 7月, 2012 1 次提交
    • N
      sctp: Fix list corruption resulting from freeing an association on a list · 2eebc1e1
      Neil Horman 提交于
      A few days ago Dave Jones reported this oops:
      
      [22766.294255] general protection fault: 0000 [#1] PREEMPT SMP
      [22766.295376] CPU 0
      [22766.295384] Modules linked in:
      [22766.387137]  ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90
      ffff880147c03a74
      [22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000
      [22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000,
      [22766.387137] Stack:
      [22766.387140]  ffff880147c03a10
      [22766.387140]  ffffffffa169f2b6
      [22766.387140]  ffff88013ed95728
      [22766.387143]  0000000000000002
      [22766.387143]  0000000000000000
      [22766.387143]  ffff880003fad062
      [22766.387144]  ffff88013c120000
      [22766.387144]
      [22766.387145] Call Trace:
      [22766.387145]  <IRQ>
      [22766.387150]  [<ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0
      [sctp]
      [22766.387154]  [<ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp]
      [22766.387157]  [<ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp]
      [22766.387161]  [<ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0
      [22766.387163]  [<ffffffff815827e3>] ? nf_hook_slow+0x133/0x210
      [22766.387166]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
      [22766.387168]  [<ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0
      [22766.387169]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
      [22766.387171]  [<ffffffff81590a07>] ip_local_deliver+0x47/0x80
      [22766.387172]  [<ffffffff8158fd80>] ip_rcv_finish+0x150/0x680
      [22766.387174]  [<ffffffff81590c54>] ip_rcv+0x214/0x320
      [22766.387176]  [<ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910
      [22766.387178]  [<ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910
      [22766.387180]  [<ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40
      [22766.387182]  [<ffffffff81558f83>] netif_receive_skb+0x23/0x1f0
      [22766.387183]  [<ffffffff815596a9>] ? dev_gro_receive+0x139/0x440
      [22766.387185]  [<ffffffff81559280>] napi_skb_finish+0x70/0xa0
      [22766.387187]  [<ffffffff81559cb5>] napi_gro_receive+0xf5/0x130
      [22766.387218]  [<ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e]
      [22766.387242]  [<ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e]
      [22766.387266]  [<ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e]
      [22766.387268]  [<ffffffff81559fea>] net_rx_action+0x1aa/0x3d0
      [22766.387270]  [<ffffffff810a495f>] ? account_system_vtime+0x10f/0x130
      [22766.387273]  [<ffffffff810734d0>] __do_softirq+0xe0/0x420
      [22766.387275]  [<ffffffff8169826c>] call_softirq+0x1c/0x30
      [22766.387278]  [<ffffffff8101db15>] do_softirq+0xd5/0x110
      [22766.387279]  [<ffffffff81073bc5>] irq_exit+0xd5/0xe0
      [22766.387281]  [<ffffffff81698b03>] do_IRQ+0x63/0xd0
      [22766.387283]  [<ffffffff8168ee2f>] common_interrupt+0x6f/0x6f
      [22766.387283]  <EOI>
      [22766.387284]
      [22766.387285]  [<ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b
      [22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48
      89 e5 48 83
      ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00
      48 89 fb
      49 89 f5 66 c1 c0 08 66 39 46 02
      [22766.387307]
      [22766.387307] RIP
      [22766.387311]  [<ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp]
      [22766.387311]  RSP <ffff880147c039b0>
      [22766.387142]  ffffffffa16ab120
      [22766.599537] ---[ end trace 3f6dae82e37b17f5 ]---
      [22766.601221] Kernel panic - not syncing: Fatal exception in interrupt
      
      It appears from his analysis and some staring at the code that this is likely
      occuring because an association is getting freed while still on the
      sctp_assoc_hashtable.  As a result, we get a gpf when traversing the hashtable
      while a freed node corrupts part of the list.
      
      Nominally I would think that an mibalanced refcount was responsible for this,
      but I can't seem to find any obvious imbalance.  What I did note however was
      that the two places where we create an association using
      sctp_primitive_ASSOCIATE (__sctp_connect and sctp_sendmsg), have failure paths
      which free a newly created association after calling sctp_primitive_ASSOCIATE.
      sctp_primitive_ASSOCIATE brings us into the sctp_sf_do_prm_asoc path, which
      issues a SCTP_CMD_NEW_ASOC side effect, which in turn adds a new association to
      the aforementioned hash table.  the sctp command interpreter that process side
      effects has not way to unwind previously processed commands, so freeing the
      association from the __sctp_connect or sctp_sendmsg error path would lead to a
      freed association remaining on this hash table.
      
      I've fixed this but modifying sctp_[un]hash_established to use hlist_del_init,
      which allows us to proerly use hlist_unhashed to check if the node is on a
      hashlist safely during a delete.  That in turn alows us to safely call
      sctp_unhash_established in the __sctp_connect and sctp_sendmsg error paths
      before freeing them, regardles of what the associations state is on the hash
      list.
      
      I noted, while I was doing this, that the __sctp_unhash_endpoint was using
      hlist_unhsashed in a simmilar fashion, but never nullified any removed nodes
      pointers to make that function work properly, so I fixed that up in a simmilar
      fashion.
      
      I attempted to test this using a virtual guest running the SCTP_RR test from
      netperf in a loop while running the trinity fuzzer, both in a loop.  I wasn't
      able to recreate the problem prior to this fix, nor was I able to trigger the
      failure after (neither of which I suppose is suprising).  Given the trace above
      however, I think its likely that this is what we hit.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Reported-by: davej@redhat.com
      CC: davej@redhat.com
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: Sridhar Samudrala <sri@us.ibm.com>
      CC: linux-sctp@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2eebc1e1
  3. 16 7月, 2012 1 次提交
  4. 16 5月, 2012 1 次提交
  5. 05 4月, 2012 1 次提交
    • T
      sctp: Allow struct sctp_event_subscribe to grow without breaking binaries · acdd5985
      Thomas Graf 提交于
      getsockopt(..., SCTP_EVENTS, ...) performs a length check and returns
      an error if the user provides less bytes than the size of struct
      sctp_event_subscribe.
      
      Struct sctp_event_subscribe needs to be extended by an u8 for every
      new event or notification type that is added.
      
      This obviously makes getsockopt fail for binaries that are compiled
      against an older versions of <net/sctp/user.h> which do not contain
      all event types.
      
      This patch changes getsockopt behaviour to no longer return an error
      if not enough bytes are being provided by the user. Instead, it
      returns as much of sctp_event_subscribe as fits into the provided buffer.
      
      This leads to the new behavior that users see what they have been aware
      of at compile time.
      
      The setsockopt(..., SCTP_EVENTS, ...) API is already behaving like this.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      acdd5985
  6. 09 3月, 2012 1 次提交
  7. 20 12月, 2011 1 次提交
  8. 12 12月, 2011 1 次提交
  9. 23 11月, 2011 1 次提交
  10. 01 11月, 2011 1 次提交
  11. 09 7月, 2011 1 次提交
  12. 07 7月, 2011 1 次提交
  13. 02 7月, 2011 1 次提交
  14. 17 6月, 2011 1 次提交
  15. 12 6月, 2011 1 次提交
  16. 02 6月, 2011 3 次提交
  17. 13 5月, 2011 2 次提交
  18. 28 4月, 2011 1 次提交
  19. 22 4月, 2011 1 次提交
  20. 20 4月, 2011 2 次提交
  21. 31 3月, 2011 1 次提交
  22. 08 3月, 2011 1 次提交
  23. 23 2月, 2011 1 次提交
  24. 20 1月, 2011 1 次提交
  25. 17 12月, 2010 1 次提交
  26. 11 12月, 2010 1 次提交
  27. 07 12月, 2010 1 次提交
  28. 11 11月, 2010 1 次提交
  29. 04 10月, 2010 2 次提交
  30. 24 9月, 2010 1 次提交
  31. 09 9月, 2010 1 次提交
  32. 07 9月, 2010 1 次提交
  33. 27 8月, 2010 1 次提交
  34. 16 5月, 2010 1 次提交
  35. 02 5月, 2010 1 次提交
    • E
      net: sock_def_readable() and friends RCU conversion · 43815482
      Eric Dumazet 提交于
      sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
      need two atomic operations (and associated dirtying) per incoming
      packet.
      
      RCU conversion is pretty much needed :
      
      1) Add a new structure, called "struct socket_wq" to hold all fields
      that will need rcu_read_lock() protection (currently: a
      wait_queue_head_t and a struct fasync_struct pointer).
      
      [Future patch will add a list anchor for wakeup coalescing]
      
      2) Attach one of such structure to each "struct socket" created in
      sock_alloc_inode().
      
      3) Respect RCU grace period when freeing a "struct socket_wq"
      
      4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
      socket_wq"
      
      5) Change sk_sleep() function to use new sk->sk_wq instead of
      sk->sk_sleep
      
      6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
      a rcu_read_lock() section.
      
      7) Change all sk_has_sleeper() callers to :
        - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
        - Use wq_has_sleeper() to eventually wakeup tasks.
        - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)
      
      8) sock_wake_async() is modified to use rcu protection as well.
      
      9) Exceptions :
        macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
      instead of dynamically allocated ones. They dont need rcu freeing.
      
      Some cleanups or followups are probably needed, (possible
      sk_callback_lock conversion to a spinlock for example...).
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43815482