1. 04 5月, 2016 1 次提交
    • J
      tipc: re-enable compensation for socket receive buffer double counting · 7c8bcfb1
      Jon Paul Maloy 提交于
      In the refactoring commit d570d864 ("tipc: enqueue arrived buffers
      in socket in separate function") we did by accident replace the test
      
      if (sk->sk_backlog.len == 0)
           atomic_set(&tsk->dupl_rcvcnt, 0);
      
      with
      
      if (sk->sk_backlog.len)
           atomic_set(&tsk->dupl_rcvcnt, 0);
      
      This effectively disables the compensation we have for the double
      receive buffer accounting that occurs temporarily when buffers are
      moved from the backlog to the socket receive queue. Until now, this
      has gone unnoticed because of the large receive buffer limits we are
      applying, but becomes indispensable when we reduce this buffer limit
      later in this series.
      
      We now fix this by inverting the mentioned condition.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c8bcfb1
  2. 02 5月, 2016 1 次提交
  3. 29 4月, 2016 1 次提交
  4. 25 4月, 2016 1 次提交
  5. 16 4月, 2016 5 次提交
    • J
      tipc: let first message on link be a state message · 34b9cd64
      Jon Paul Maloy 提交于
      According to the link FSM, a received traffic packet can take a link
      from state ESTABLISHING to ESTABLISHED, but the link can still not be
      fully set up in one atomic operation. This means that even if the the
      very first packet on the link is a traffic packet with sequence number
      1 (one), it has to be dropped and retransmitted.
      
      This can be avoided if we let the mentioned packet be preceded by a
      LINK_PROTOCOL/STATE message, which takes up the endpoint before the
      arrival of the traffic.
      
      We add this small feature in this commit.
      
      This is a fully compatible change.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34b9cd64
    • J
      tipc: ensure that first packets on link are sent in order · de7e07f9
      Jon Paul Maloy 提交于
      In some link establishment scenarios we see that packet #2 may be sent
      out before packet #1, forcing the receiver to demand retransmission of
      the missing packet. This is harmless, but may cause confusion among
      people tracing the packet flow.
      
      Since this is extremely easy to fix, we do so by adding en extra send
      call to the bearer immediately after the link has come up.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de7e07f9
    • J
      tipc: refactor function tipc_link_timeout() · 42b18f60
      Jon Paul Maloy 提交于
      The function tipc_link_timeout() is unnecessary complex, and can
      easily be made more readable.
      
      We do that with this commit. The only functional change is that we
      remove a redundant test for whether the broadcast link is up or not.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      42b18f60
    • J
      tipc: reduce transmission rate of reset messages when link is down · 88e8ac70
      Jon Paul Maloy 提交于
      When a link is down, it will continuously try to re-establish contact
      with the peer by sending out a RESET or an ACTIVATE message at each
      timeout interval. The default value for this interval is currently
      375 ms. This is wasteful, and may become a problem in very large
      clusters with dozens or hundreds of nodes being down simultaneously.
      
      We now introduce a simple backoff algorithm for these cases. The
      first five messages are sent at default rate; thereafter a message
      is sent only each 16th timer interval.
      
      This will cover the vast majority of link recycling cases, since the
      endpoint starting last will transmit at the higher speed, and the link
      should normally be established well be before the rate needs to be
      reduced.
      
      The only case where we will see a degradation of link re-establishment
      times is when the endpoints remain intact, and a glitch in the
      transmission media is causing the link reset. We will then experience
      a worst-case re-establishing time of 6 seconds, something we deem
      acceptable.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88e8ac70
    • J
      tipc: guarantee peer bearer id exchange after reboot · 634696b1
      Jon Paul Maloy 提交于
      When a link endpoint is going down locally, e.g., because its interface
      is being stopped, it will spontaneously send out a RESET message to
      its peer, informing it about this fact. This saves the peer from
      detecting the failure via probing, and hence gives both speedier and
      less resource consuming failure detection on the peer side.
      
      According to the link FSM, a receiver of a RESET message, ignoring the
      reason for it, must now consider the sender ready to come back up, and
      starts periodically sending out ACTIVATE messages to the peer in order
      to re-establish the link. Also, according to the FSM, the receiver of
      an ACTIVATE message can now go directly to state ESTABLISHED and start
      sending regular traffic packets. This is a well-proven and robust FSM.
      
      However, in the case of a reboot, there is a small possibilty that link
      endpoint on the rebooted node may have been re-created with a new bearer
      identity between the moment it sent its (pre-boot) RESET and the moment
      it receives the ACTIVATE from the peer. The new bearer identity cannot
      be known by the peer according to this scenario, since traffic headers
      don't convey such information. This is a problem, because both endpoints
      need to know the correct value of the peer's bearer id at any moment in
      time in order to be able to produce correct link events for their users.
      
      The only way to guarantee this is to enforce a full setup message
      exchange (RESET + ACTIVATE) even after the reboot, since those messages
      carry the bearer idientity in their header.
      
      In this commit we do this by introducing and setting a "stopping" bit in
      the header of the spontaneously generated RESET messages, informing the
      peer that the sender will not be immediately ready to re-establish the
      link. A receiver seeing this bit must act as if this were a locally
      detected connectivity failure, and hence has to go through a full two-
      way setup message exchange before any link can be re-established.
      
      Although never reported, this problem seems to have always been around.
      
      This protocol addition is fully backwards compatible.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      634696b1
  6. 15 4月, 2016 1 次提交
    • P
      tipc: fix a race condition leading to subscriber refcnt bug · 333f7962
      Parthasarathy Bhuvaragan 提交于
      Until now, the requests sent to topology server are queued
      to a workqueue by the generic server framework.
      These messages are processed by worker threads and trigger the
      registered callbacks.
      To reduce latency on uniprocessor systems, explicit rescheduling
      is performed using cond_resched() after MAX_RECV_MSG_COUNT(25)
      messages.
      
      This implementation on SMP systems leads to an subscriber refcnt
      error as described below:
      When a worker thread yields by calling cond_resched() in a SMP
      system, a new worker is created on another CPU to process the
      pending workitem. Sometimes the sleeping thread wakes up before
      the new thread finishes execution.
      This breaks the assumption on ordering and being single threaded.
      The fault is more frequent when MAX_RECV_MSG_COUNT is lowered.
      
      If the first thread was processing subscription create and the
      second thread processing close(), the close request will free
      the subscriber and the create request oops as follows:
      
      [31.224137] WARNING: CPU: 2 PID: 266 at include/linux/kref.h:46 tipc_subscrb_rcv_cb+0x317/0x380         [tipc]
      [31.228143] CPU: 2 PID: 266 Comm: kworker/u8:1 Not tainted 4.5.0+ #97
      [31.228377] Workqueue: tipc_rcv tipc_recv_work [tipc]
      [...]
      [31.228377] Call Trace:
      [31.228377]  [<ffffffff812fbb6b>] dump_stack+0x4d/0x72
      [31.228377]  [<ffffffff8105a311>] __warn+0xd1/0xf0
      [31.228377]  [<ffffffff8105a3fd>] warn_slowpath_null+0x1d/0x20
      [31.228377]  [<ffffffffa0098067>] tipc_subscrb_rcv_cb+0x317/0x380 [tipc]
      [31.228377]  [<ffffffffa00a4984>] tipc_receive_from_sock+0xd4/0x130 [tipc]
      [31.228377]  [<ffffffffa00a439b>] tipc_recv_work+0x2b/0x50 [tipc]
      [31.228377]  [<ffffffff81071925>] process_one_work+0x145/0x3d0
      [31.246554] ---[ end trace c3882c9baa05a4fd ]---
      [31.248327] BUG: spinlock bad magic on CPU#2, kworker/u8:1/266
      [31.249119] BUG: unable to handle kernel NULL pointer dereference at 0000000000000428
      [31.249323] IP: [<ffffffff81099d0c>] spin_dump+0x5c/0xe0
      [31.249323] PGD 0
      [31.249323] Oops: 0000 [#1] SMP
      
      In this commit, we
      - rename tipc_conn_shutdown() to tipc_conn_release().
      - move connection release callback execution from tipc_close_conn()
        to a new function tipc_sock_release(), which is executed before
        we free the connection.
      Thus we release the subscriber during connection release procedure
      rather than connection shutdown procedure.
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      333f7962
  7. 14 4月, 2016 1 次提交
  8. 12 4月, 2016 2 次提交
  9. 08 4月, 2016 2 次提交
    • J
      tipc: stricter filtering of packets in bearer layer · 5b7066c3
      Jon Paul Maloy 提交于
      Resetting a bearer/interface, with the consequence of resetting all its
      pertaining links, is not an atomic action. This becomes particularly
      evident in very large clusters, where a lot of traffic may happen on the
      remaining links while we are busy shutting them down. In extreme cases,
      we may even see links being re-created and re-established before we are
      finished with the job.
      
      To solve this, we now introduce a solution where we temporarily detach
      the bearer from the interface when the bearer is reset. This inhibits
      all packet reception, while sending still is possible. For the latter,
      we use the fact that the device's user pointer now is zero to filter out
      which packets can be sent during this situation; i.e., outgoing RESET
      messages only.  This filtering serves to speed up the neighbors'
      detection of the loss event, and saves us from unnecessary probing.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b7066c3
    • J
      tipc: eliminate buffer leak in bearer layer · 4e801fa1
      Jon Paul Maloy 提交于
      When enabling a bearer we create a 'neigbor discoverer' instance by
      calling the function tipc_disc_create() before the bearer is actually
      registered in the list of enabled bearers. Because of this, the very
      first discovery broadcast message, created by the mentioned function,
      is lost, since it cannot find any valid bearer to use. Furthermore,
      the used send function, tipc_bearer_xmit_skb() does not free the given
      buffer when it cannot find a  bearer, resulting in the leak of exactly
      one send buffer each time a bearer is enabled.
      
      This commit fixes this problem by introducing two changes:
      
      1) Instead of attemting to send the discovery message directly, we let
         tipc_disc_create() return the discovery buffer to the calling
         function, tipc_enable_bearer(), so that the latter can send it
         when the enabling sequence is finished.
      
      2) In tipc_bearer_xmit_skb(), as well as in the two other transmit
         functions at the bearer layer, we now free the indicated buffer or
         buffer chain when a valid bearer cannot be found.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e801fa1
  10. 15 3月, 2016 1 次提交
  11. 12 3月, 2016 1 次提交
  12. 08 3月, 2016 1 次提交
  13. 07 3月, 2016 6 次提交
  14. 04 3月, 2016 1 次提交
    • P
      tipc: Revert "tipc: use existing sk_write_queue for outgoing packet chain" · f214fc40
      Parthasarathy Bhuvaragan 提交于
      reverts commit 94153e36 ("tipc: use existing sk_write_queue for
      outgoing packet chain")
      
      In Commit 94153e36, we assume that we fill & empty the socket's
      sk_write_queue within the same lock_sock() session.
      
      This is not true if the link is congested. During congestion, the
      socket lock is released while we wait for the congestion to cease.
      This implementation causes a nullptr exception, if the user space
      program has several threads accessing the same socket descriptor.
      
      Consider two threads of the same program performing the following:
           Thread1                                  Thread2
      --------------------                    ----------------------
      Enter tipc_sendmsg()                    Enter tipc_sendmsg()
      lock_sock()                             lock_sock()
      Enter tipc_link_xmit(), ret=ELINKCONG   spin on socket lock..
      sk_wait_event()                             :
      release_sock()                          grab socket lock
          :                                   Enter tipc_link_xmit(), ret=0
          :                                   release_sock()
      Wakeup after congestion
      lock_sock()
      skb = skb_peek(pktchain);
      !! TIPC_SKB_CB(skb)->wakeup_pending = tsk->link_cong;
      
      In this case, the second thread transmits the buffers belonging to
      both thread1 and thread2 successfully. When the first thread wakeup
      after the congestion it assumes that the pktchain is intact and
      operates on the skb's in it, which leads to the following exception:
      
      [2102.439969] BUG: unable to handle kernel NULL pointer dereference at 00000000000000d0
      [2102.440074] IP: [<ffffffffa005f330>] __tipc_link_xmit+0x2b0/0x4d0 [tipc]
      [2102.440074] PGD 3fa3f067 PUD 3fa6b067 PMD 0
      [2102.440074] Oops: 0000 [#1] SMP
      [2102.440074] CPU: 2 PID: 244 Comm: sender Not tainted 3.12.28 #1
      [2102.440074] RIP: 0010:[<ffffffffa005f330>]  [<ffffffffa005f330>] __tipc_link_xmit+0x2b0/0x4d0 [tipc]
      [...]
      [2102.440074] Call Trace:
      [2102.440074]  [<ffffffff8163f0b9>] ? schedule+0x29/0x70
      [2102.440074]  [<ffffffffa006a756>] ? tipc_node_unlock+0x46/0x170 [tipc]
      [2102.440074]  [<ffffffffa005f761>] tipc_link_xmit+0x51/0xf0 [tipc]
      [2102.440074]  [<ffffffffa006d8ae>] tipc_send_stream+0x11e/0x4f0 [tipc]
      [2102.440074]  [<ffffffff8106b150>] ? __wake_up_sync+0x20/0x20
      [2102.440074]  [<ffffffffa006dc9c>] tipc_send_packet+0x1c/0x20 [tipc]
      [2102.440074]  [<ffffffff81502478>] sock_sendmsg+0xa8/0xd0
      [2102.440074]  [<ffffffff81507895>] ? release_sock+0x145/0x170
      [2102.440074]  [<ffffffff815030d8>] ___sys_sendmsg+0x3d8/0x3e0
      [2102.440074]  [<ffffffff816426ae>] ? _raw_spin_unlock+0xe/0x10
      [2102.440074]  [<ffffffff81115c2a>] ? handle_mm_fault+0x6ca/0x9d0
      [2102.440074]  [<ffffffff8107dd65>] ? set_next_entity+0x85/0xa0
      [2102.440074]  [<ffffffff816426de>] ? _raw_spin_unlock_irq+0xe/0x20
      [2102.440074]  [<ffffffff8107463c>] ? finish_task_switch+0x5c/0xc0
      [2102.440074]  [<ffffffff8163ea8c>] ? __schedule+0x34c/0x950
      [2102.440074]  [<ffffffff81504e12>] __sys_sendmsg+0x42/0x80
      [2102.440074]  [<ffffffff81504e62>] SyS_sendmsg+0x12/0x20
      [2102.440074]  [<ffffffff8164aed2>] system_call_fastpath+0x16/0x1b
      
      In this commit, we maintain the skb list always in the stack.
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f214fc40
  15. 26 2月, 2016 3 次提交
    • F
      tipc: fix null deref crash in compat config path · 619b1745
      Florian Westphal 提交于
      msg.dst_sk needs to be set up with a valid socket because some callbacks
      later derive the netns from it.
      
      Fixes: 263ea09084d172d ("Revert "genl: Add genlmsg_new_unicast() for unicast message allocation")
      Reported-by: NJon Maloy <maloy@donjonn.com>
      Bisected-by: NJon Maloy <maloy@donjonn.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by Jon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      619b1745
    • J
      tipc: fix crash during node removal · d25a0125
      Jon Paul Maloy 提交于
      When the TIPC module is unloaded, we have identified a race condition
      that allows a node reference counter to go to zero and the node instance
      being freed before the node timer is finished with accessing it. This
      leads to occasional crashes, especially in multi-namespace environments.
      
      The scenario goes as follows:
      
      CPU0:(node_stop)                       CPU1:(node_timeout)  // ref == 2
      
      1:                                          if(!mod_timer())
      2: if (del_timer())
      3:   tipc_node_put()                                        // ref -> 1
      4: tipc_node_put()                                          // ref -> 0
      5:   kfree_rcu(node);
      6:                                               tipc_node_get(node)
      7:                                               // BOOM!
      
      We now clean up this functionality as follows:
      
      1) We remove the node pointer from the node lookup table before we
         attempt deactivating the timer. This way, we reduce the risk that
         tipc_node_find() may obtain a valid pointer to an instance marked
         for deletion; a harmless but undesirable situation.
      
      2) We use del_timer_sync() instead of del_timer() to safely deactivate
         the node timer without any risk that it might be reactivated by the
         timeout handler. There is no risk of deadlock here, since the two
         functions never touch the same spinlocks.
      
      3: We remove a pointless tipc_node_get() + tipc_node_put() from the
         timeout handler.
      Reported-by: NZhijiang Hu <huzhijiang@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d25a0125
    • J
      tipc: eliminate risk of finding to-be-deleted node instance · b170997a
      Jon Paul Maloy 提交于
      Although we have never seen it happen, we have identified the
      following problematic scenario when nodes are stopped and deleted:
      
      CPU0:                            CPU1:
      
      tipc_node_xxx()                                   //ref == 1
         tipc_node_put()                                //ref -> 0
                                       tipc_node_find() // node still in table
             tipc_node_delete()
               list_del_rcu(n. list)
                                       tipc_node_get()  //ref -> 1, bad
               kfree_rcu()
      
                                       tipc_node_put() //ref to 0 again.
                                       kfree_rcu()     // BOOM!
      
      We fix this by introducing use of the conditional kref_get_if_not_zero()
      instead of kref_get() in the function tipc_node_find(). This eliminates
      any risk of post-mortem access.
      Reported-by: NZhijiang Hu <huzhijiang@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b170997a
  16. 20 2月, 2016 1 次提交
  17. 19 2月, 2016 1 次提交
  18. 17 2月, 2016 2 次提交
  19. 06 2月, 2016 8 次提交
    • P
      tipc: use alloc_ordered_workqueue() instead of WQ_UNBOUND w/ max_active = 1 · 06c8581f
      Parthasarathy Bhuvaragan 提交于
      Until now, tipc_rcv and tipc_send workqueues in server are allocated
      with parameters WQ_UNBOUND & max_active = 1.
      This parameters passed to this function makes it equivalent to
      alloc_ordered_workqueue(). The later form is more explicit and
      can inherit future ordered_workqueue changes.
      
      In this commit we replace alloc_workqueue() with more readable
      alloc_ordered_workqueue().
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06c8581f
    • P
      tipc: donot create timers if subscription timeout = TIPC_WAIT_FOREVER · ae245557
      Parthasarathy Bhuvaragan 提交于
      Until now, we create timers even for the subscription requests
      with timeout = TIPC_WAIT_FOREVER.
      This can be improved by avoiding timer creation when the timeout
      is set to TIPC_WAIT_FOREVER.
      
      In this commit, we introduce a check to creates timers only
      when timeout != TIPC_WAIT_FOREVER.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae245557
    • P
      tipc: protect tipc_subscrb_get() with subscriber spin lock · f3ad288c
      Parthasarathy Bhuvaragan 提交于
      Until now, during subscription creation the mod_time() &
      tipc_subscrb_get() are called after releasing the subscriber
      spin lock.
      
      In a SMP system when performing a subscription creation, if the
      subscription timeout occurs simultaneously (the timer is
      scheduled to run on another CPU) then the timer thread
      might decrement the subscribers refcount before the create
      thread increments the refcount.
      
      This can be simulated by creating subscription with timeout=0 and
      sometimes the timeout occurs before the create request is complete.
      This leads to the following message:
      [30.702949] BUG: spinlock bad magic on CPU#1, kworker/u8:3/87
      [30.703834] general protection fault: 0000 [#1] SMP
      [30.704826] CPU: 1 PID: 87 Comm: kworker/u8:3 Not tainted 4.4.0-rc8+ #18
      [30.704826] Workqueue: tipc_rcv tipc_recv_work [tipc]
      [30.704826] task: ffff88003f878600 ti: ffff88003fae0000 task.ti: ffff88003fae0000
      [30.704826] RIP: 0010:[<ffffffff8109196c>]  [<ffffffff8109196c>] spin_dump+0x5c/0xe0
      [...]
      [30.704826] Call Trace:
      [30.704826]  [<ffffffff81091a16>] spin_bug+0x26/0x30
      [30.704826]  [<ffffffff81091b75>] do_raw_spin_lock+0xe5/0x120
      [30.704826]  [<ffffffff81684439>] _raw_spin_lock_bh+0x19/0x20
      [30.704826]  [<ffffffffa0096f10>] tipc_subscrb_rcv_cb+0x1d0/0x330 [tipc]
      [30.704826]  [<ffffffffa00a37b1>] tipc_receive_from_sock+0xc1/0x150 [tipc]
      [30.704826]  [<ffffffffa00a31df>] tipc_recv_work+0x3f/0x80 [tipc]
      [30.704826]  [<ffffffff8106a739>] process_one_work+0x149/0x3c0
      [30.704826]  [<ffffffff8106aa16>] worker_thread+0x66/0x460
      [30.704826]  [<ffffffff8106a9b0>] ? process_one_work+0x3c0/0x3c0
      [30.704826]  [<ffffffff8106a9b0>] ? process_one_work+0x3c0/0x3c0
      [30.704826]  [<ffffffff8107029d>] kthread+0xed/0x110
      [30.704826]  [<ffffffff810701b0>] ? kthread_create_on_node+0x190/0x190
      [30.704826]  [<ffffffff81684bdf>] ret_from_fork+0x3f/0x70
      
      In this commit,
      1. we remove the check for the return code for mod_timer()
      2. we protect tipc_subscrb_get() using the subscriber spin lock.
         We increment the subscriber's refcount as soon as we add the
         subscription to subscriber's subscription list.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3ad288c
    • P
      tipc: hold subscriber->lock for tipc_nametbl_subscribe() · d4091899
      Parthasarathy Bhuvaragan 提交于
      Until now, while creating a subscription the subscriber lock
      protects only the subscribers subscription list and not the
      nametable. The call to tipc_nametbl_subscribe() is outside
      the lock. However, at subscription timeout and cancel both
      the subscribers subscription list and the nametable are
      protected by the subscriber lock.
      
      This asymmetric locking mechanism leads to the following problem:
      In a SMP system, the timer can be fire on another core before
      the create request is complete.
      When the timer thread calls tipc_nametbl_unsubscribe() before create
      thread calls tipc_nametbl_subscribe(), we get a nullptr exception.
      
      This can be simulated by creating subscription with timeout=0 and
      sometimes the timeout occurs before the create request is complete.
      
      The following is the oops:
      [57.569661] BUG: unable to handle kernel NULL pointer dereference at (null)
      [57.577498] IP: [<ffffffffa02135aa>] tipc_nametbl_unsubscribe+0x8a/0x120 [tipc]
      [57.584820] PGD 0
      [57.586834] Oops: 0002 [#1] SMP
      [57.685506] CPU: 14 PID: 10077 Comm: kworker/u40:1 Tainted: P OENX 3.12.48-52.27.1.     9688.1.PTF-default #1
      [57.703637] Workqueue: tipc_rcv tipc_recv_work [tipc]
      [57.708697] task: ffff88064c7f00c0 ti: ffff880629ef4000 task.ti: ffff880629ef4000
      [57.716181] RIP: 0010:[<ffffffffa02135aa>]  [<ffffffffa02135aa>] tipc_nametbl_unsubscribe+0x8a/   0x120 [tipc]
      [...]
      [57.812327] Call Trace:
      [57.814806]  [<ffffffffa0211c77>] tipc_subscrp_delete+0x37/0x90 [tipc]
      [57.821357]  [<ffffffffa0211e2f>] tipc_subscrp_timeout+0x3f/0x70 [tipc]
      [57.827982]  [<ffffffff810618c1>] call_timer_fn+0x31/0x100
      [57.833490]  [<ffffffff81062709>] run_timer_softirq+0x1f9/0x2b0
      [57.839414]  [<ffffffff8105a795>] __do_softirq+0xe5/0x230
      [57.844827]  [<ffffffff81520d1c>] call_softirq+0x1c/0x30
      [57.850150]  [<ffffffff81004665>] do_softirq+0x55/0x90
      [57.855285]  [<ffffffff8105aa35>] irq_exit+0x95/0xa0
      [57.860290]  [<ffffffff815215b5>] smp_apic_timer_interrupt+0x45/0x60
      [57.866644]  [<ffffffff8152005d>] apic_timer_interrupt+0x6d/0x80
      [57.872686]  [<ffffffffa02121c5>] tipc_subscrb_rcv_cb+0x2a5/0x3f0 [tipc]
      [57.879425]  [<ffffffffa021c65f>] tipc_receive_from_sock+0x9f/0x100 [tipc]
      [57.886324]  [<ffffffffa021c826>] tipc_recv_work+0x26/0x60 [tipc]
      [57.892463]  [<ffffffff8106fb22>] process_one_work+0x172/0x420
      [57.898309]  [<ffffffff8107079a>] worker_thread+0x11a/0x3c0
      [57.903871]  [<ffffffff81077114>] kthread+0xb4/0xc0
      [57.908751]  [<ffffffff8151f318>] ret_from_fork+0x58/0x90
      
      In this commit, we do the following at subscription creation:
      1. set the subscription's subscriber pointer before performing
         tipc_nametbl_subscribe(), as this value is required further in
         the call chain ex: by tipc_subscrp_send_event().
      2. move tipc_nametbl_subscribe() under the scope of subscriber lock
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4091899
    • P
      tipc: fix connection abort when receiving invalid cancel request · cb01c7c8
      Parthasarathy Bhuvaragan 提交于
      Until now, the subscribers endianness for a subscription
      create/cancel request is determined as:
          swap = !(s->filter & (TIPC_SUB_PORTS | TIPC_SUB_SERVICE))
      The checks are performed only for port/service subscriptions.
      
      The swap calculation is incorrect if the filter in the subscription
      cancellation request is set to TIPC_SUB_CANCEL (it's a malformed
      cancel request, as the corresponding subscription create filter
      is missing).
      Thus, the check if the request is for cancellation fails and the
      request is treated as a subscription create request. The
      subscription creation fails as the request is illegal, which
      terminates this connection.
      
      In this commit we determine the endianness by including
      TIPC_SUB_CANCEL, which will set swap correctly and the
      request is processed as a cancellation request.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb01c7c8
    • P
      tipc: fix connection abort during subscription cancellation · c8beccc6
      Parthasarathy Bhuvaragan 提交于
      In 'commit 7fe8097c ("tipc: fix nullpointer bug when subscribing
      to events")', we terminate the connection if the subscription
      creation fails.
      In the same commit, the subscription creation result was based on
      the value of subscription pointer (set in the function) instead of
      the return code.
      
      Unfortunately, the same function also handles subscription
      cancellation request. For a subscription cancellation request,
      the subscription pointer cannot be set. Thus the connection is
      terminated during cancellation request.
      
      In this commit, we move the subcription cancel check outside
      of tipc_subscrp_create(). Hence,
      - tipc_subscrp_create() will create a subscripton
      - tipc_subscrb_rcv_cb() will subscribe or cancel a subscription.
      
      Fixes: 'commit 7fe8097c ("tipc: fix nullpointer bug when subscribing to events")'
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8beccc6
    • P
      tipc: introduce tipc_subscrb_subscribe() routine · 7c13c622
      Parthasarathy Bhuvaragan 提交于
      In this commit, we split tipc_subscrp_create() into two:
      1. tipc_subscrp_create() creates a subscription
      2. A new function tipc_subscrp_subscribe() adds the
         subscription to the subscriber subscription list,
         activates the subscription timer and subscribes to
         the nametable updates.
      
      In future commits, the purpose of tipc_subscrb_rcv_cb() will
      be to either subscribe or cancel a subscription.
      
      There is no functional change in this commit.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c13c622
    • P
      tipc: remove struct tipc_name_seq from struct tipc_subscription · a4273c73
      Parthasarathy Bhuvaragan 提交于
      Until now, struct tipc_subscriber has duplicate fields for
      type, upper and lower (as member of struct tipc_name_seq) at:
      1. as member seq in struct tipc_subscription
      2. as member seq in struct tipc_subscr, which is contained
         in struct tipc_event
      The former structure contains the type, upper and lower
      values in network byte order and the later contains the
      intact copy of the request.
      The struct tipc_subscription contains a field swap to
      determine if request needs network byte order conversion.
      Thus by using swap, we can convert the request when
      required instead of duplicating it.
      
      In this commit,
      1. we remove the references to these elements as members of
         struct tipc_subscription and replace them with elements
         from struct tipc_subscr.
      2. provide new functions to convert the user request into
         network byte order.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4273c73