1. 17 1月, 2014 2 次提交
    • Y
      tipc: standardize accept routine · 6398e23c
      Ying Xue 提交于
      Comparing the behaviour of how to wait for events in TIPC accept()
      with other stacks, the TIPC implementation might be perceived as
      different, and sometimes even incorrect. As sk_sleep() and
      sk->sk_receive_queue variables associated with socket are not
      protected by socket lock, the process of calling accept() may be
      woken up improperly or sometimes cannot be woken up at all. After
      standardizing it with inet_csk_wait_for_connect routine, we can
      get benefits including: avoiding 'thundering herd' phenomenon,
      adding a timeout mechanism for accept(), coping with a pending
      signal, and having sk_sleep() and sk->sk_receive_queue being
      always protected within socket lock scope and so on.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6398e23c
    • Y
      tipc: standardize connect routine · 78eb3a53
      Ying Xue 提交于
      Comparing the behaviour of how to wait for events in TIPC connect()
      with other stacks, the TIPC implementation might be perceived as
      different, and sometimes even incorrect. For instance, as both
      sock->state and sk_sleep() are directly fed to
      wait_event_interruptible_timeout() as its arguments, and socket lock
      has to be released before we call wait_event_interruptible_timeout(),
      the two variables associated with socket are exposed out of socket
      lock protection, thereby probably getting stale values so that the
      process of calling connect() cannot be woken up exactly even if
      correct event arrives or it is woken up improperly even if the wake
      condition is not satisfied in practice. Therefore, standardizing its
      behaviour with sk_stream_wait_connect routine can avoid these risks.
      
      Additionally the implementation of connect routine is simplified as a
      whole, allowing it to return correct values in all different cases.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78eb3a53
  2. 15 1月, 2014 1 次提交
  3. 08 1月, 2014 5 次提交
    • J
      tipc: make link start event synchronous · 581465fa
      Jon Paul Maloy 提交于
      When a link is created we delay the start event by launching it
      to be executed later in a tasklet. As we hold all the
      necessary locks at the moment of creation, and there is no risk
      of deadlock or contention, this delay serves no purpose in the
      current code.
      
      We remove this obsolete indirection step, and the associated function
      link_start(). At the same time, we rename the function tipc_link_stop()
      to the more appropriate tipc_link_purge_queues().
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      581465fa
    • Y
      tipc: introduce new spinlock to protect struct link_req · f9a2c80b
      Ying Xue 提交于
      Currently, only 'bearer_lock' is used to protect struct link_req in
      the function disc_timeout(). This is unsafe, since the member fields
      'num_nodes' and 'timer_intv' might be accessed by below three different
      threads simultaneously, none of them grabbing bearer_lock in the
      critical region:
      
      link_activate()
        tipc_bearer_add_dest()
          tipc_disc_add_dest()
            req->num_nodes++;
      
      tipc_link_reset()
        tipc_bearer_remove_dest()
          tipc_disc_remove_dest()
            req->num_nodes--
            disc_update()
              read req->num_nodes
      	write req->timer_intv
      
      disc_timeout()
        read req->num_nodes
        read/write req->timer_intv
      
      Without lock protection, the only symptom of a race is that discovery
      messages occasionally may not be sent out. This is not fatal, since such
      messages are best-effort anyway. On the other hand, since discovery
      messages are not time critical, adding a protecting lock brings no
      serious overhead either. So we add a new, dedicated spinlock in
      order to guarantee absolute data consistency in link_req objects.
      This also helps reduce the overall role of the bearer_lock, which
      we want to remove completely in a later commit series.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9a2c80b
    • J
      tipc: remove 'has_redundant_link' flag from STATE link protocol messages · b9d4c339
      Jon Paul Maloy 提交于
      The flag 'has_redundant_link' is defined only in RESET and ACTIVATE
      protocol messages. Due to an ambiguity in the protocol specification it
      is currently also transferred in STATE messages. Its value is used to
      initialize a link state variable, 'permit_changeover', which is used
      to inhibit futile link failover attempts when it is known that the
      peer node has no working links at the moment, although the local node
      may still think it has one.
      
      The fact that 'has_redundant_link' incorrectly is read from STATE
      messages has the effect that 'permit_changeover' sometimes gets a wrong
      value, and permanently blocks any links from being re-established. Such
      failures can only occur in in dual-link systems, and are extremely rare.
      This bug seems to have always been present in the code.
      
      Furthermore, since commit b4b56102
      ("tipc: Ensure both nodes recognize loss of contact between them"),
      the 'permit_changeover' field serves no purpose any more. The task of
      enforcing 'lost contact' cycles at both peer endpoints is now taken
      by a new mechanism, using the flags WAIT_NODE_DOWN and WAIT_PEER_DOWN
      in struct tipc_node to abort unnecessary failover attempts.
      
      We therefore remove the 'has_redundant_link' flag from STATE messages,
      as well as the now redundant 'permit_changeover' variable.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9d4c339
    • J
      tipc: rename functions related to link failover and improve comments · 170b3927
      Jon Paul Maloy 提交于
      The functionality related to link addition and failover is unnecessarily
      hard to understand and maintain. We try to improve this by renaming
      some of the functions, at the same time adding or improving the
      explanatory comments around them. Names such as "tipc_rcv()" etc. also
      align better with what is used in other networking components.
      
      The changes in this commit are purely cosmetic, no functional changes
      are made.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      170b3927
    • E
      tipc: correctly unlink packets from deferred packet queue · 732256b9
      Erik Hugne 提交于
      When we pull a received packet from a link's 'deferred packets' queue
      for processing, its 'next' pointer is not cleared, and still refers to
      the next packet in that queue, if any. This is incorrect, but caused
      no harm before commit 40ba3cdf ("tipc:
      message reassembly using fragment chain") was introduced. After that
      commit, it may sometimes lead to the following oops:
      
      general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
      Modules linked in: tipc
      CPU: 4 PID: 0 Comm: swapper/4 Tainted: G        W 3.13.0-rc2+ #6
      Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      task: ffff880017af4880 ti: ffff880017aee000 task.ti: ffff880017aee000
      RIP: 0010:[<ffffffff81710694>]  [<ffffffff81710694>] skb_try_coalesce+0x44/0x3d0
      RSP: 0018:ffff880016603a78  EFLAGS: 00010212
      RAX: 6b6b6b6bd6d6d6d6 RBX: ffff880013106ac0 RCX: ffff880016603ad0
      RDX: ffff880016603ad7 RSI: ffff88001223ed00 RDI: ffff880013106ac0
      RBP: ffff880016603ab8 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000000 R12: ffff88001223ed00
      R13: ffff880016603ad0 R14: 000000000000058c R15: ffff880012297650
      FS:  0000000000000000(0000) GS:ffff880016600000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 000000000805b000 CR3: 0000000011f5d000 CR4: 00000000000006e0
      Stack:
       ffff880016603a88 ffffffff810a38ed ffff880016603aa8 ffff88001223ed00
       0000000000000001 ffff880012297648 ffff880016603b68 ffff880012297650
       ffff880016603b08 ffffffffa0006c51 ffff880016603b08 00ffffffa00005fc
      Call Trace:
       <IRQ>
       [<ffffffff810a38ed>] ? trace_hardirqs_on+0xd/0x10
       [<ffffffffa0006c51>] tipc_link_recv_fragment+0xd1/0x1b0 [tipc]
       [<ffffffffa0007214>] tipc_recv_msg+0x4e4/0x920 [tipc]
       [<ffffffffa00016f0>] ? tipc_l2_rcv_msg+0x40/0x250 [tipc]
       [<ffffffffa000177c>] tipc_l2_rcv_msg+0xcc/0x250 [tipc]
       [<ffffffffa00016f0>] ? tipc_l2_rcv_msg+0x40/0x250 [tipc]
       [<ffffffff8171e65b>] __netif_receive_skb_core+0x80b/0xd00
       [<ffffffff8171df94>] ? __netif_receive_skb_core+0x144/0xd00
       [<ffffffff8171eb76>] __netif_receive_skb+0x26/0x70
       [<ffffffff8171ed6d>] netif_receive_skb+0x2d/0x200
       [<ffffffff8171fe70>] napi_gro_receive+0xb0/0x130
       [<ffffffff815647c2>] e1000_clean_rx_irq+0x2c2/0x530
       [<ffffffff81565986>] e1000_clean+0x266/0x9c0
       [<ffffffff81985f7b>] ? notifier_call_chain+0x2b/0x160
       [<ffffffff8171f971>] net_rx_action+0x141/0x310
       [<ffffffff81051c1b>] __do_softirq+0xeb/0x480
       [<ffffffff819817bb>] ? _raw_spin_unlock+0x2b/0x40
       [<ffffffff810b8c42>] ? handle_fasteoi_irq+0x72/0x100
       [<ffffffff81052346>] irq_exit+0x96/0xc0
       [<ffffffff8198cbc3>] do_IRQ+0x63/0xe0
       [<ffffffff81981def>] common_interrupt+0x6f/0x6f
       <EOI>
      
      This happens when the last fragment of a message has passed through the
      the receiving link's 'deferred packets' queue, and at least one other
      packet was added to that queue while it was there. After the fragment
      chain with the complete message has been successfully delivered to the
      receiving socket, it is released. Since 'next' pointer of the last
      fragment in the released chain now is non-NULL, we get the crash shown
      above.
      
      We fix this by clearing the 'next' pointer of all received packets,
      including those being pulled from the 'deferred' queue, before they
      undergo any further processing.
      
      Fixes: 40ba3cdf ("tipc: message reassembly using fragment chain")
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Reported-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      732256b9
  4. 05 1月, 2014 2 次提交
  5. 02 1月, 2014 1 次提交
  6. 30 12月, 2013 1 次提交
    • Y
      tipc: fix deadlock during socket release · 84602761
      Ying Xue 提交于
      A deadlock might occur if name table is withdrawn in socket release
      routine, and while packets are still being received from bearer.
      
             CPU0                       CPU1
      T0:   recv_msg()               release()
      T1:   tipc_recv_msg()          tipc_withdraw()
      T2:   [grab node lock]         [grab port lock]
      T3:   tipc_link_wakeup_ports() tipc_nametbl_withdraw()
      T4:   [grab port lock]*        named_cluster_distribute()
      T5:   wakeupdispatch()         tipc_link_send()
      T6:                            [grab node lock]*
      
      The opposite order of holding port lock and node lock on above two
      different paths may result in a deadlock. If socket lock instead of
      port lock is used to protect port instance in tipc_withdraw(), the
      reverse order of holding port lock and node lock will be eliminated,
      as a result, the deadlock is killed as well.
      Reported-by: NLars Everbrand <lars.everbrand@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84602761
  7. 17 12月, 2013 4 次提交
  8. 11 12月, 2013 9 次提交
  9. 10 12月, 2013 1 次提交
    • E
      tipc: remove interface state mirroring in bearer · 512137ee
      Erik Hugne 提交于
      struct 'tipc_bearer' is a generic representation of the underlying
      media type, and exists in a one-to-one relationship to each interface
      TIPC is using. The struct contains a 'blocked' flag that mirrors the
      operational and execution state of the represented interface, and is
      updated through notification calls from the latter. The users of
      tipc_bearer are checking this flag before each attempt to send a
      packet via the interface.
      
      This state mirroring serves no purpose in the current code base. TIPC
      links will not discover a media failure any faster through this
      mechanism, and in reality the flag only adds overhead at packet
      sending and reception.
      
      Furthermore, the fact that the flag needs to be protected by a spinlock
      aggregated into tipc_bearer has turned out to cause a serious and
      completely unnecessary deadlock problem.
      
      CPU0                                    CPU1
      ----                                    ----
      Time 0: bearer_disable()                link_timeout()
      Time 1:   spin_lock_bh(&b_ptr->lock)      tipc_link_push_queue()
      Time 2:   tipc_link_delete()                tipc_bearer_blocked(b_ptr)
      Time 3:     k_cancel_timer(&req->timer)       spin_lock_bh(&b_ptr->lock)
      Time 4:       del_timer_sync(&req->timer)
      
      I.e., del_timer_sync() on CPU0 never returns, because the timer handler
      on CPU1 is waiting for the bearer lock.
      
      We eliminate the 'blocked' flag from struct tipc_bearer, along with all
      tests on this flag. This not only resolves the deadlock, but also
      simplifies and speeds up the data path execution of TIPC. It also fits
      well into our ongoing effort to make the locking policy simpler and
      more manageable.
      
      An effect of this change is that we can get rid of functions such as
      tipc_bearer_blocked(), tipc_continue() and tipc_block_bearer().
      We replace the latter with a new function, tipc_reset_bearer(), which
      resets all links associated to the bearer immediately after an
      interface goes down.
      
      A user might notice one slight change in link behaviour after this
      change. When an interface goes down, (e.g. through a NETDEV_DOWN
      event) all attached links will be reset immediately, instead of
      leaving it to each link to detect the failure through a timer-driven
      mechanism. We consider this an improvement, and see no obvious risks
      with the new behavior.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NPaul Gortmaker <Paul.Gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      512137ee
  10. 21 11月, 2013 1 次提交
    • H
      net: rework recvmsg handler msg_name and msg_namelen logic · f3d33426
      Hannes Frederic Sowa 提交于
      This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
      set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
      to return msg_name to the user.
      
      This prevents numerous uninitialized memory leaks we had in the
      recvmsg handlers and makes it harder for new code to accidentally leak
      uninitialized memory.
      
      Optimize for the case recvfrom is called with NULL as address. We don't
      need to copy the address at all, so set it to NULL before invoking the
      recvmsg handler. We can do so, because all the recvmsg handlers must
      cope with the case a plain read() is called on them. read() also sets
      msg_name to NULL.
      
      Also document these changes in include/linux/net.h as suggested by David
      Miller.
      
      Changes since RFC:
      
      Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
      non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
      affect sendto as it would bail out earlier while trying to copy-in the
      address. It also more naturally reflects the logic by the callers of
      verify_iovec.
      
      With this change in place I could remove "
      if (!uaddr || msg_sys->msg_namelen == 0)
      	msg->msg_name = NULL
      ".
      
      This change does not alter the user visible error logic as we ignore
      msg_namelen as long as msg_name is NULL.
      
      Also remove two unnecessary curly brackets in ___sys_recvmsg and change
      comments to netdev style.
      
      Cc: David Miller <davem@davemloft.net>
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3d33426
  11. 20 11月, 2013 1 次提交
  12. 15 11月, 2013 1 次提交
  13. 08 11月, 2013 3 次提交
    • E
      tipc: reassembly failures should cause link reset · a715b49e
      Erik Hugne 提交于
      If appending a received fragment to the pending fragment chain
      in a unicast link fails, the current code tries to force a retransmission
      of the fragment by decrementing the 'next received sequence number'
      field in the link. This is done under the assumption that the failure
      is caused by an out-of-memory situation, an assumption that does
      not hold true after the previous patch in this series.
      
      A failure to append a fragment can now only be caused by a protocol
      violation by the sending peer, and it must hence be assumed that it
      is either malicious or buggy.  Either way, the correct behavior is now
      to reset the link instead of trying to revert its sequence number.
      So, this is what we do in this commit.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a715b49e
    • E
      tipc: message reassembly using fragment chain · 40ba3cdf
      Erik Hugne 提交于
      When the first fragment of a long data data message is received on a link, a
      reassembly buffer large enough to hold the data from this and all subsequent
      fragments of the message is allocated. The payload of each new fragment is
      copied into this buffer upon arrival. When the last fragment is received, the
      reassembled message is delivered upwards to the port/socket layer.
      
      Not only is this an inefficient approach, but it may also cause bursts of
      reassembly failures in low memory situations. since we may fail to allocate
      the necessary large buffer in the first place. Furthermore, after 100 subsequent
      such failures the link will be reset, something that in reality aggravates the
      situation.
      
      To remedy this problem, this patch introduces a different approach. Instead of
      allocating a big reassembly buffer, we now append the arriving fragments
      to a reassembly chain on the link, and deliver the whole chain up to the
      socket layer once the last fragment has been received. This is safe because
      the retransmission layer of a TIPC link always delivers packets in strict
      uninterrupted order, to the reassembly layer as to all other upper layers.
      Hence there can never be more than one fragment chain pending reassembly at
      any given time in a link, and we can trust (but still verify) that the
      fragments will be chained up in the correct order.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40ba3cdf
    • E
      tipc: don't reroute message fragments · 528f6f4b
      Erik Hugne 提交于
      When a message fragment is received in a broadcast or unicast link,
      the reception code will append the fragment payload to a big reassembly
      buffer through a call to the function tipc_recv_fragm(). However, after
      the return of that call, the logics goes on and passes the fragment
      buffer to the function tipc_net_route_msg(), which will simply drop it.
      This behavior is a remnant from the now obsolete multi-cluster
      functionality, and has no relevance in the current code base.
      
      Although currently harmless, this unnecessary call would be fatal
      after applying the next patch in this series, which introduces
      a completely new reassembly algorithm. So we change the code to
      eliminate the redundant call.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      528f6f4b
  14. 31 10月, 2013 1 次提交
    • Y
      tipc: remove two indentation levels in tipc_recv_msg routine · 3af390e2
      Ying Xue 提交于
      The message dispatching part of tipc_recv_msg() is wrapped layers of
      while/if/if/switch, causing out-of-control indentation and does not
      look very good. We reduce two indentation levels by separating the
      message dispatching from the blocks that checks link state and
      sequence numbers, allowing longer function and arg names to be
      consistently indented without wrapping. Additionally we also rename
      "cont" label to "discard" and add one new label called "unlock_discard"
      to make code clearer. In all, these are cosmetic changes that do not
      alter the operation of TIPC in any way.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Cc: David Laight <david.laight@aculab.com>
      Cc: Andreas Bofjäll <andreas.bofjall@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3af390e2
  15. 20 10月, 2013 1 次提交
  16. 19 10月, 2013 6 次提交