1. 11 12月, 2014 1 次提交
  2. 27 11月, 2014 2 次提交
  3. 25 11月, 2014 1 次提交
  4. 22 11月, 2014 1 次提交
  5. 22 10月, 2014 1 次提交
    • Y
      tipc: fix a potential deadlock · 7b8613e0
      Ying Xue 提交于
      Locking dependency detected below possible unsafe locking scenario:
      
                 CPU0                          CPU1
      T0:  tipc_named_rcv()                tipc_rcv()
      T1:  [grab nametble write lock]*     [grab node lock]*
      T2:  tipc_update_nametbl()           tipc_node_link_up()
      T3:  tipc_nodesub_subscribe()        tipc_nametbl_publish()
      T4:  [grab node lock]*               [grab nametble write lock]*
      
      The opposite order of holding nametbl write lock and node lock on
      above two different paths may result in a deadlock. If we move the
      the updating of the name table after link state named out of node
      lock, the reverse order of holding locks will be eliminated, and
      as a result, the deadlock risk.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b8613e0
  6. 08 10月, 2014 1 次提交
    • J
      tipc: fix bug in multicast congestion handling · 908344cd
      Jon Maloy 提交于
      One aim of commit 50100a5e ("tipc:
      use pseudo message to wake up sockets after link congestion") was
      to handle link congestion abatement in a uniform way for both unicast
      and multicast transmit. However, the latter doesn't work correctly,
      and has been broken since the referenced commit was applied.
      
      If a user now sends a burst of multicast messages that is big
      enough to cause broadcast link congestion, it will be put to sleep,
      and not be waked up when the congestion abates as it should be.
      
      This has two reasons. First, the flag that is used, TIPC_WAKEUP_USERS,
      is set correctly, but in the wrong field. Instead of setting it in the
      'action_flags' field of the arrival node struct, it is by mistake set
      in the dummy node struct that is owned by the broadcast link, where it
      will never tested for. Second, we cannot use the same flag for waking
      up unicast and multicast users, since the function tipc_node_unlock()
      needs to pick the wakeup pseudo messages to deliver from different
      queues. It must hence be able to distinguish between the two cases.
      
      This commit solves this problem by adding a new flag
      TIPC_WAKEUP_BCAST_USERS, and a new function tipc_bclink_wakeup_user().
      The latter is to be called by tipc_node_unlock() when the named flag,
      now set in the correct field, is encountered.
      
      v2: using explicit 'unsigned int' declaration instead of 'uint', as
      per comment from David Miller.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      908344cd
  7. 24 8月, 2014 2 次提交
    • J
      tipc: use message to abort connections when losing contact to node · 02be61a9
      Jon Paul Maloy 提交于
      In the current implementation, each 'struct tipc_node' instance keeps
      a linked list of those ports/sockets that are connected to the node
      represented by that struct. The purpose of this is to let the node
      object know which sockets to alert when it loses contact with its peer
      node, i.e., which sockets need to have their connections aborted.
      
      This entails an unwanted direct reference from the node structure
      back to the port/socket structure, and a need to grab port_lock
      when we have to make an upcall to the port. We want to get rid of
      this unecessary BH entry point into the socket, and also eliminate
      its use of port_lock.
      
      In this commit, we instead let the node struct keep list of "connected
      socket" structs, which each represents a connected socket, but is
      allocated independently by the node at the moment of connection. If
      the node loses contact with its peer node, the list is traversed, and
      a "connection abort" message is created for each entry in the list. The
      message is sent to it respective connected socket using the ordinary
      data path, and the receiving socket aborts its connections upon reception
      of the message.
      
      This enables us to get rid of the direct reference from 'struct node' to
      ´struct port', and another unwanted BH access point to the latter.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02be61a9
    • J
      tipc: use pseudo message to wake up sockets after link congestion · 50100a5e
      Jon Paul Maloy 提交于
      The current link implementation keeps a linked list of blocked ports/
      sockets that is populated when there is link congestion. The purpose
      of this is to let the link know which users to wake up when the
      congestion abates.
      
      This adds unnecessary complexity to the data structure and the code,
      since it forces us to involve the link each time we want to delete
      a socket. It also forces us to grab the spinlock port_lock within
      the scope of node_lock. We want to get rid of this direct dependence,
      as well as the deadlock hazard resulting from the usage of port_lock.
      
      In this commit, we instead let the link keep list of a "wakeup" pseudo
      messages for use in such situations. Those messages are sent to the
      pending sockets via the ordinary message reception path, and wake up
      the socket's owner when they are received.
      
      This enables us to get rid of the 'waiting_ports' linked lists in struct
      tipc_port that manifest this direct reference. As a consequence, we can
      eliminate another BH entry into the socket, and hence the need to grab
      port_lock. This is a further step in our effort to remove port_lock
      altogether.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50100a5e
  8. 17 7月, 2014 1 次提交
  9. 28 6月, 2014 1 次提交
  10. 15 5月, 2014 1 次提交
  11. 09 5月, 2014 2 次提交
  12. 06 5月, 2014 4 次提交
  13. 29 4月, 2014 1 次提交
  14. 27 4月, 2014 2 次提交
  15. 23 4月, 2014 2 次提交
    • Y
      tipc: purge tipc_net_lock lock · 7216cd94
      Ying Xue 提交于
      Now tipc routing hierarchy comprises the structures 'node', 'link'and
      'bearer'. The whole hierarchy is protected by a big read/write lock,
      tipc_net_lock, to ensure that nothing is added or removed while code
      is accessing any of these structures. Obviously the locking policy
      makes node, link and bearer components closely bound together so that
      their relationship becomes unnecessarily complex. In the worst case,
      such locking policy not only has a negative influence on performance,
      but also it's prone to lead to deadlock occasionally.
      
      In order o decouple the complex relationship between bearer and node
      as well as link, the locking policy is adjusted as follows:
      
      - Bearer level
        RTNL lock is used on update side, and RCU is used on read side.
        Meanwhile, all bearer instances including broadcast bearer are
        saved into bearer_list array.
      
      - Node and link level
        All node instances are saved into two tipc_node_list and node_htable
        lists. The two lists are protected by node_list_lock on write side,
        and they are guarded with RCU lock on read side. All members in node
        structure including link instances are protected by node spin lock.
      
      - The relationship between bearer and node
        When link accesses bearer, it first needs to find the bearer with
        its bearer identity from the bearer_list array. When bearer accesses
        node, it can iterate the node_htable hash list with the node
        address to find the corresponding node.
      
      In the new locking policy, every component has its private locking
      solution and the relationship between bearer and node is very simple,
      that is, they can find each other with node address or bearer identity
      from node_htable hash list or bearer_list array.
      
      Until now above all changes have been done, so tipc_net_lock can be
      removed safely.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Tested-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7216cd94
    • Y
      tipc: decouple the relationship between bearer and link · 7a2f7d18
      Ying Xue 提交于
      Currently on both paths of message transmission and reception, the
      read lock of tipc_net_lock must be held before bearer is accessed,
      while the write lock of tipc_net_lock has to be taken before bearer
      is configured. Although it can ensure that bearer is always valid on
      the two data paths, link and bearer is closely bound together.
      
      So as the part of effort of removing tipc_net_lock, the locking
      policy of bearer protection will be adjusted as below: on the two
      data paths, RCU is used, and on the configuration path of bearer,
      RTNL lock is applied.
      
      Now RCU just covers the path of message reception. To make it possible
      to protect the path of message transmission with RCU, link should not
      use its stored bearer pointer to access bearer, but it should use the
      bearer identity of its attached bearer as index to get bearer instance
      from bearer_list array, which can help us decouple the relationship
      between bearer and link. As a result, bearer on the path of message
      transmission can be safely protected by RCU when we access bearer_list
      array within RCU lock protection.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Tested-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a2f7d18
  16. 28 3月, 2014 5 次提交
  17. 19 2月, 2014 1 次提交
    • Y
      tipc: align tipc function names with common naming practice in the network · 247f0f3c
      Ying Xue 提交于
      Rename the following functions, which are shorter and more in line
      with common naming practice in the network subsystem.
      
      tipc_bclink_send_msg->tipc_bclink_xmit
      tipc_bclink_recv_pkt->tipc_bclink_rcv
      tipc_disc_recv_msg->tipc_disc_rcv
      tipc_link_send_proto_msg->tipc_link_proto_xmit
      link_recv_proto_msg->tipc_link_proto_rcv
      link_send_sections_long->tipc_link_iovec_long_xmit
      tipc_link_send_sections_fast->tipc_link_iovec_xmit_fast
      tipc_link_send_sync->tipc_link_sync_xmit
      tipc_link_recv_sync->tipc_link_sync_rcv
      tipc_link_send_buf->__tipc_link_xmit
      tipc_link_send->tipc_link_xmit
      tipc_link_send_names->tipc_link_names_xmit
      tipc_named_recv->tipc_named_rcv
      tipc_link_recv_bundle->tipc_link_bundle_rcv
      tipc_link_dup_send_queue->tipc_link_dup_queue_xmit
      link_send_long_buf->tipc_link_frag_xmit
      
      tipc_multicast->tipc_port_mcast_xmit
      tipc_port_recv_mcast->tipc_port_mcast_rcv
      tipc_port_reject_sections->tipc_port_iovec_reject
      tipc_port_recv_proto_msg->tipc_port_proto_rcv
      tipc_connect->tipc_port_connect
      __tipc_connect->__tipc_port_connect
      __tipc_disconnect->__tipc_port_disconnect
      tipc_disconnect->tipc_port_disconnect
      tipc_shutdown->tipc_port_shutdown
      tipc_port_recv_msg->tipc_port_rcv
      tipc_port_recv_sections->tipc_port_iovec_rcv
      
      release->tipc_release
      accept->tipc_accept
      bind->tipc_bind
      get_name->tipc_getname
      poll->tipc_poll
      send_msg->tipc_sendmsg
      send_packet->tipc_send_packet
      send_stream->tipc_send_stream
      recv_msg->tipc_recvmsg
      recv_stream->tipc_recv_stream
      connect->tipc_connect
      listen->tipc_listen
      shutdown->tipc_shutdown
      setsockopt->tipc_setsockopt
      getsockopt->tipc_getsockopt
      
      Above changes have no impact on current users of the functions.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      247f0f3c
  18. 17 2月, 2014 1 次提交
    • J
      tipc: fix a loop style problem · 074bb43e
      Jon Paul Maloy 提交于
      In commit 7d33939f
      ("tipc: delay delete of link when failover is needed") we
      introduced a loop for finding and removing a link pointer
      in an array. The removal is done after we have left the loop,
      giving the impression that one may remove the wrong pointer
      if no matching element is found.
      
      This is not really a bug, since we know that there will always
      be a matching element, but it looks wrong, and causes a smatch
      warning.
      
      We fix this loop with this commit.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      074bb43e
  19. 14 2月, 2014 1 次提交
    • J
      tipc: delay delete of link when failover is needed · 7d33939f
      Jon Paul Maloy 提交于
      When a bearer is disabled, all its attached links are deleted.
      Ideally, we should do link failover to redundant links on other bearers,
      if there are any, in such cases. This would be consistent with current
      behavior when a link is reset, but not deleted. However, due to the
      complexity involved, and the (wrongly) perceived low demand for this
      feature, it was never implemented until now.
      
      We mark the doomed link for deletion with a new flag, but wait until the
      failover process is finished before we actually delete it. With the
      improved link tunnelling/failover code introduced earlier in this commit
      series, it is now easy to identify a spot in the code where the failover
      is finished and it is safe to delete the marked link. Moreover, the test
      for the flag and the deletion can be done synchronously, and outside the
      most time critical data path.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d33939f
  20. 08 1月, 2014 1 次提交
  21. 05 1月, 2014 1 次提交
  22. 11 12月, 2013 1 次提交
  23. 08 11月, 2013 1 次提交
    • E
      tipc: message reassembly using fragment chain · 40ba3cdf
      Erik Hugne 提交于
      When the first fragment of a long data data message is received on a link, a
      reassembly buffer large enough to hold the data from this and all subsequent
      fragments of the message is allocated. The payload of each new fragment is
      copied into this buffer upon arrival. When the last fragment is received, the
      reassembled message is delivered upwards to the port/socket layer.
      
      Not only is this an inefficient approach, but it may also cause bursts of
      reassembly failures in low memory situations. since we may fail to allocate
      the necessary large buffer in the first place. Furthermore, after 100 subsequent
      such failures the link will be reset, something that in reality aggravates the
      situation.
      
      To remedy this problem, this patch introduces a different approach. Instead of
      allocating a big reassembly buffer, we now append the arriving fragments
      to a reassembly chain on the link, and deliver the whole chain up to the
      socket layer once the last fragment has been received. This is safe because
      the retransmission layer of a TIPC link always delivers packets in strict
      uninterrupted order, to the reassembly layer as to all other upper layers.
      Hence there can never be more than one fragment chain pending reassembly at
      any given time in a link, and we can trust (but still verify) that the
      fragments will be chained up in the correct order.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40ba3cdf
  24. 28 2月, 2013 1 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
  25. 23 11月, 2012 1 次提交
    • J
      tipc: introduce message to synchronize broadcast link · c64f7a6a
      Jon Maloy 提交于
      Upon establishing a first link between two nodes, there is
      currently a risk that the two endpoints will disagree on exactly
      which sequence number reception and acknowleding of broadcast
      packets should start.
      
      The following scenarios may happen:
      
      1: Node A sends an ACTIVATE message to B, telling it to start acking
         packets from sequence number N.
      2: Node A sends out broadcast N, but does not expect an acknowledge
         from B, since B is not yet in its broadcast receiver's list.
      3: Node A receives ACK for N from all nodes except B, and releases
         packet N.
      4: Node B receives the ACTIVATE, activates its link endpoint, and
         stores the value N as sequence number of first expected packet.
      5: Node B sends a NAME_DISTR message to A.
      6: Node A receives the NAME_DISTR message, and activates its endpoint.
         At this moment B is added to A's broadcast receiver's set.
         Node A also sets sequence number 0 as the first broadcast packet
         to be received from B.
      7: Node A sends broadcast N+1.
      8: B receives N+1, determines there is a gap in the sequence, since
         it is expecting N, and sends a NACK for N back to A.
      9: Node A has already released N, so no retransmission is possible.
         The broadcast link in direction A->B is stale.
      
      In addition to, or instead of, 7-9 above, the following may happen:
      
      10: Node B sends broadcast M > 0 to A.
      11: Node A receives M, falsely decides there must be a gap, since
          it is expecting packet 0, and asks for retransmission of packets
          [0,M-1].
      12: Node B has already released these packets, so the broadcast
          link is stale in direction B->A.
      
      We solve this problem by introducing a new unicast message type,
      BCAST_PROTOCOL/STATE, to convey the sequence number of the next
      sent broadcast packet to the other endpoint, at exactly the moment
      that endpoint is added to the own node's broadcast receivers list,
      and before any other unicast messages are permitted to be sent.
      
      Furthermore, we don't allow any node to start receiving and
      processing broadcast packets until this new synchronization
      message has been received.
      
      To maintain backwards compatibility, we still open up for
      broadcast reception if we receive a NAME_DISTR message without
      any preceding broadcast sync message. In this case, we must
      assume that the other end has an older code version, and will
      never send out the new synchronization message. Hence, for mixed
      old and new nodes, the issue arising in 7-12 of the above may
      happen with the same probability as before.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      c64f7a6a
  26. 22 11月, 2012 2 次提交
  27. 14 7月, 2012 1 次提交