1. 04 5月, 2016 1 次提交
    • J
      tipc: re-enable compensation for socket receive buffer double counting · 7c8bcfb1
      Jon Paul Maloy 提交于
      In the refactoring commit d570d864 ("tipc: enqueue arrived buffers
      in socket in separate function") we did by accident replace the test
      
      if (sk->sk_backlog.len == 0)
           atomic_set(&tsk->dupl_rcvcnt, 0);
      
      with
      
      if (sk->sk_backlog.len)
           atomic_set(&tsk->dupl_rcvcnt, 0);
      
      This effectively disables the compensation we have for the double
      receive buffer accounting that occurs temporarily when buffers are
      moved from the backlog to the socket receive queue. Until now, this
      has gone unnoticed because of the large receive buffer limits we are
      applying, but becomes indispensable when we reduce this buffer limit
      later in this series.
      
      We now fix this by inverting the mentioned condition.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c8bcfb1
  2. 08 3月, 2016 1 次提交
  3. 04 3月, 2016 1 次提交
    • P
      tipc: Revert "tipc: use existing sk_write_queue for outgoing packet chain" · f214fc40
      Parthasarathy Bhuvaragan 提交于
      reverts commit 94153e36 ("tipc: use existing sk_write_queue for
      outgoing packet chain")
      
      In Commit 94153e36, we assume that we fill & empty the socket's
      sk_write_queue within the same lock_sock() session.
      
      This is not true if the link is congested. During congestion, the
      socket lock is released while we wait for the congestion to cease.
      This implementation causes a nullptr exception, if the user space
      program has several threads accessing the same socket descriptor.
      
      Consider two threads of the same program performing the following:
           Thread1                                  Thread2
      --------------------                    ----------------------
      Enter tipc_sendmsg()                    Enter tipc_sendmsg()
      lock_sock()                             lock_sock()
      Enter tipc_link_xmit(), ret=ELINKCONG   spin on socket lock..
      sk_wait_event()                             :
      release_sock()                          grab socket lock
          :                                   Enter tipc_link_xmit(), ret=0
          :                                   release_sock()
      Wakeup after congestion
      lock_sock()
      skb = skb_peek(pktchain);
      !! TIPC_SKB_CB(skb)->wakeup_pending = tsk->link_cong;
      
      In this case, the second thread transmits the buffers belonging to
      both thread1 and thread2 successfully. When the first thread wakeup
      after the congestion it assumes that the pktchain is intact and
      operates on the skb's in it, which leads to the following exception:
      
      [2102.439969] BUG: unable to handle kernel NULL pointer dereference at 00000000000000d0
      [2102.440074] IP: [<ffffffffa005f330>] __tipc_link_xmit+0x2b0/0x4d0 [tipc]
      [2102.440074] PGD 3fa3f067 PUD 3fa6b067 PMD 0
      [2102.440074] Oops: 0000 [#1] SMP
      [2102.440074] CPU: 2 PID: 244 Comm: sender Not tainted 3.12.28 #1
      [2102.440074] RIP: 0010:[<ffffffffa005f330>]  [<ffffffffa005f330>] __tipc_link_xmit+0x2b0/0x4d0 [tipc]
      [...]
      [2102.440074] Call Trace:
      [2102.440074]  [<ffffffff8163f0b9>] ? schedule+0x29/0x70
      [2102.440074]  [<ffffffffa006a756>] ? tipc_node_unlock+0x46/0x170 [tipc]
      [2102.440074]  [<ffffffffa005f761>] tipc_link_xmit+0x51/0xf0 [tipc]
      [2102.440074]  [<ffffffffa006d8ae>] tipc_send_stream+0x11e/0x4f0 [tipc]
      [2102.440074]  [<ffffffff8106b150>] ? __wake_up_sync+0x20/0x20
      [2102.440074]  [<ffffffffa006dc9c>] tipc_send_packet+0x1c/0x20 [tipc]
      [2102.440074]  [<ffffffff81502478>] sock_sendmsg+0xa8/0xd0
      [2102.440074]  [<ffffffff81507895>] ? release_sock+0x145/0x170
      [2102.440074]  [<ffffffff815030d8>] ___sys_sendmsg+0x3d8/0x3e0
      [2102.440074]  [<ffffffff816426ae>] ? _raw_spin_unlock+0xe/0x10
      [2102.440074]  [<ffffffff81115c2a>] ? handle_mm_fault+0x6ca/0x9d0
      [2102.440074]  [<ffffffff8107dd65>] ? set_next_entity+0x85/0xa0
      [2102.440074]  [<ffffffff816426de>] ? _raw_spin_unlock_irq+0xe/0x20
      [2102.440074]  [<ffffffff8107463c>] ? finish_task_switch+0x5c/0xc0
      [2102.440074]  [<ffffffff8163ea8c>] ? __schedule+0x34c/0x950
      [2102.440074]  [<ffffffff81504e12>] __sys_sendmsg+0x42/0x80
      [2102.440074]  [<ffffffff81504e62>] SyS_sendmsg+0x12/0x20
      [2102.440074]  [<ffffffff8164aed2>] system_call_fastpath+0x16/0x1b
      
      In this commit, we maintain the skb list always in the stack.
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f214fc40
  4. 01 12月, 2015 1 次提交
  5. 24 11月, 2015 1 次提交
    • Y
      tipc: avoid packets leaking on socket receive queue · f4195d1e
      Ying Xue 提交于
      Even if we drain receive queue thoroughly in tipc_release() after tipc
      socket is removed from rhashtable, it is possible that some packets
      are in flight because some CPU runs receiver and did rhashtable lookup
      before we removed socket. They will achieve receive queue, but nobody
      delete them at all. To avoid this leak, we register a private socket
      destructor to purge receive queue, meaning releasing packets pending
      on receive queue will be delayed until the last reference of tipc
      socket will be released.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4195d1e
  6. 24 10月, 2015 2 次提交
    • J
      tipc: introduce jumbo frame support for broadcast · 959e1781
      Jon Paul Maloy 提交于
      Until now, we have only been supporting a fix MTU size of 1500 bytes
      for all broadcast media, irrespective of their actual capability.
      
      We now make the broadcast MTU adaptable to the carrying media, i.e.,
      we use the smallest MTU supported by any of the interfaces attached
      to TIPC.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      959e1781
    • J
      tipc: move bcast definitions to bcast.c · 6beb19a6
      Jon Paul Maloy 提交于
      Currently, a number of structure and function definitions related
      to the broadcast functionality are unnecessarily exposed in the file
      bcast.h. This obscures the fact that the external interface towards
      the broadcast link in fact is very narrow, and causes unnecessary
      recompilations of other files when anything changes in those
      definitions.
      
      In this commit, we move as many of those definitions as is currently
      possible to the file bcast.c.
      
      We also rename the structure 'tipc_bclink' to 'tipc_bc_base', both
      since the name does not correctly describe the contents of this
      struct, and will do so even less in the future, and because we want
      to use the term 'link' more appropriately in the functionality
      introduced later in this series.
      
      Finally, we rename a couple of functions, such as tipc_bclink_xmit()
      and others that will be kept in the future, to include the term 'bcast'
      instead.
      
      There are no functional changes in this commit.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6beb19a6
  7. 27 7月, 2015 3 次提交
    • J
      tipc: clean up socket layer message reception · cda3696d
      Jon Paul Maloy 提交于
      When a message is received in a socket, one of the call chains
      tipc_sk_rcv()->tipc_sk_enqueue()->filter_rcv()(->tipc_sk_proto_rcv())
      or
      tipc_sk_backlog_rcv()->filter_rcv()(->tipc_sk_proto_rcv())
      are followed. At each of these levels we may encounter situations
      where the message may need to be rejected, or a new message
      produced for transfer back to the sender. Despite recent
      improvements, the current code for doing this is perceived
      as awkward and hard to follow.
      
      Leveraging the two previous commits in this series, we now
      introduce a more uniform handling of such situations. We
      let each of the functions in the chain itself produce/reverse
      the message to be returned to the sender, but also perform the
      actual forwarding. This simplifies the necessary logics within
      each function.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cda3696d
    • J
      tipc: introduce new tipc_sk_respond() function · bcd3ffd4
      Jon Paul Maloy 提交于
      Currently, we use the code sequence
      
      if (msg_reverse())
         tipc_link_xmit_skb()
      
      at numerous locations in socket.c. The preparation of arguments
      for these calls, as well as the sequence itself, makes the code
      unecessarily complex.
      
      In this commit, we introduce a new function, tipc_sk_respond(),
      that performs this call combination. We also replace some, but not
      yet all, of these explicit call sequences with calls to the new
      function. Notably, we let the function tipc_sk_proto_rcv() use
      the new function to directly send out PROBE_REPLY messages,
      instead of deferring this to the calling tipc_sk_rcv() function,
      as we do now.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bcd3ffd4
    • J
      tipc: let function tipc_msg_reverse() expand header when needed · 29042e19
      Jon Paul Maloy 提交于
      The shortest TIPC message header, for cluster local CONNECTED messages,
      is 24 bytes long. With this format, the fields "dest_node" and
      "orig_node" are optimized away, since they in reality are redundant
      in this particular case.
      
      However, the absence of these fields leads to code inconsistencies
      that are difficult to handle in some cases, especially when we need
      to reverse or reject messages at the socket layer.
      
      In this commit, we concentrate the handling of the absent fields
      to one place, by letting the function tipc_msg_reverse() reallocate
      the buffer and expand the header to 32 bytes when necessary. This
      means that the socket code now can assume that the two previously
      absent fields are present in the header when a message needs to be
      rejected. This opens up for some further simplifications of the
      socket code.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29042e19
  8. 21 7月, 2015 2 次提交
    • J
      tipc: make media xmit call outside node spinlock context · af9b028e
      Jon Paul Maloy 提交于
      Currently, message sending is performed through a deep call chain,
      where the node spinlock is grabbed and held during a significant
      part of the transmission time. This is clearly detrimental to
      overall throughput performance; it would be better if we could send
      the message after the spinlock has been released.
      
      In this commit, we do instead let the call revert on the stack after
      the buffer chain has been added to the transmission queue, whereafter
      clones of the buffers are transmitted to the device layer outside the
      spinlock scope.
      
      As a further step in our effort to separate the roles of the node
      and link entities we also move the function tipc_link_xmit() to
      node.c, and rename it to tipc_node_xmit().
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af9b028e
    • J
      tipc: change sk_buffer handling in tipc_link_xmit() · 22d85c79
      Jon Paul Maloy 提交于
      When the function tipc_link_xmit() is given a buffer list for
      transmission, it currently consumes the list both when transmission
      is successful and when it fails, except for the special case when
      it encounters link congestion.
      
      This behavior is inconsistent, and needs to be corrected if we want
      to avoid problems in later commits in this series.
      
      In this commit, we change this to let the function consume the list
      only when transmission is successful, and leave the list with the
      sender in all other cases. We also modifiy the socket code so that
      it adapts to this change, i.e., purges the list when a non-congestion
      error code is returned.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22d85c79
  9. 09 7月, 2015 1 次提交
  10. 11 6月, 2015 1 次提交
  11. 31 5月, 2015 1 次提交
  12. 15 5月, 2015 1 次提交
    • J
      tipc: simplify include dependencies · a6bf70f7
      Jon Paul Maloy 提交于
      When we try to add new inline functions in the code, we sometimes
      run into circular include dependencies.
      
      The main problem is that the file core.h, which really should be at
      the root of the dependency chain, instead is a leaf. I.e., core.h
      includes a number of header files that themselves should be allowed
      to include core.h. In reality this is unnecessary, because core.h does
      not need to know the full signature of any of the structs it refers to,
      only their type declaration.
      
      In this commit, we remove all dependencies from core.h towards any
      other tipc header file.
      
      As a consequence of this change, we can now move the function
      tipc_own_addr(net) from addr.c to addr.h, and make it inline.
      
      There are no functional changes in this commit.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6bf70f7
  13. 11 5月, 2015 1 次提交
  14. 23 4月, 2015 1 次提交
  15. 25 3月, 2015 2 次提交
  16. 24 3月, 2015 1 次提交
  17. 21 3月, 2015 1 次提交
  18. 20 3月, 2015 2 次提交
  19. 19 3月, 2015 1 次提交
  20. 18 3月, 2015 1 次提交
    • Y
      tipc: fix netns refcnt leak · 76100a8a
      Ying Xue 提交于
      When the TIPC module is loaded, we launch a topology server in kernel
      space, which in its turn is creating TIPC sockets for communication
      with topology server users. Because both the socket's creator and
      provider reside in the same module, it is necessary that the TIPC
      module's reference count remains zero after the server is started and
      the socket created; otherwise it becomes impossible to perform "rmmod"
      even on an idle module.
      
      Currently, we achieve this by defining a separate "tipc_proto_kern"
      protocol struct, that is used only for kernel space socket allocations.
      This structure has the "owner" field set to NULL, which restricts the
      module reference count from being be bumped when sk_alloc() for local
      sockets is called. Furthermore, we have defined three kernel-specific
      functions, tipc_sock_create_local(), tipc_sock_release_local() and
      tipc_sock_accept_local(), to avoid the module counter being modified
      when module local sockets are created or deleted. This has worked well
      until we introduced name space support.
      
      However, after name space support was introduced, we have observed that
      a reference count leak occurs, because the netns counter is not
      decremented in tipc_sock_delete_local().
      
      This commit remedies this problem. But instead of just modifying
      tipc_sock_delete_local(), we eliminate the whole parallel socket
      handling infrastructure, and start using the regular sk_create_kern(),
      kernel_accept() and sk_release_kernel() calls. Since those functions
      manipulate the module counter, we must now compensate for that by
      explicitly decrementing the counter after module local sockets are
      created, and increment it just before calling sk_release_kernel().
      
      Fixes: a62fbcce ("tipc: make subscriber server support net namespace")
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reported-by: NCong Wang <cwang@twopensource.com>
      Tested-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76100a8a
  21. 10 3月, 2015 1 次提交
  22. 03 3月, 2015 2 次提交
  23. 28 2月, 2015 1 次提交
  24. 10 2月, 2015 3 次提交
  25. 09 2月, 2015 1 次提交
  26. 06 2月, 2015 6 次提交
    • J
      tipc: eliminate race condition at multicast reception · cb1b7280
      Jon Paul Maloy 提交于
      In a previous commit in this series we resolved a race problem during
      unicast message reception.
      
      Here, we resolve the same problem at multicast reception. We apply the
      same technique: an input queue serializing the delivery of arriving
      buffers. The main difference is that here we do it in two steps.
      First, the broadcast link feeds arriving buffers into the tail of an
      arrival queue, which head is consumed at the socket level, and where
      destination lookup is performed. Second, if the lookup is successful,
      the resulting buffer clones are fed into a second queue, the input
      queue. This queue is consumed at reception in the socket just like
      in the unicast case. Both queues are protected by the same lock, -the
      one of the input queue.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb1b7280
    • J
      tipc: simplify socket multicast reception · 3c724acd
      Jon Paul Maloy 提交于
      The structure 'tipc_port_list' is used to collect port numbers
      representing multicast destination socket on a receiving node.
      The list is not based on a standard linked list, and is in reality
      optimized for the uncommon case that there are more than one
      multicast destinations per node. This makes the list handling
      unecessarily complex, and as a consequence, even the socket
      multicast reception becomes more complex.
      
      In this commit, we replace 'tipc_port_list' with a new 'struct
      tipc_plist', which is based on a standard list. We give the new
      list stack (push/pop) semantics, someting that simplifies
      the implementation of the function tipc_sk_mcast_rcv().
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c724acd
    • J
      tipc: resolve race problem at unicast message reception · c637c103
      Jon Paul Maloy 提交于
      TIPC handles message cardinality and sequencing at the link layer,
      before passing messages upwards to the destination sockets. During the
      upcall from link to socket no locks are held. It is therefore possible,
      and we see it happen occasionally, that messages arriving in different
      threads and delivered in sequence still bypass each other before they
      reach the destination socket. This must not happen, since it violates
      the sequentiality guarantee.
      
      We solve this by adding a new input buffer queue to the link structure.
      Arriving messages are added safely to the tail of that queue by the
      link, while the head of the queue is consumed, also safely, by the
      receiving socket. Sequentiality is secured per socket by only allowing
      buffers to be dequeued inside the socket lock. Since there may be multiple
      simultaneous readers of the queue, we use a 'filter' parameter to reduce
      the risk that they peek the same buffer from the queue, hence also
      reducing the risk of contention on the receiving socket locks.
      
      This solves the sequentiality problem, and seems to cause no measurable
      performance degradation.
      
      A nice side effect of this change is that lock handling in the functions
      tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
      will enable future simplifications of those functions.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c637c103
    • J
      tipc: use existing sk_write_queue for outgoing packet chain · 94153e36
      Jon Paul Maloy 提交于
      The list for outgoing traffic buffers from a socket is currently
      allocated on the stack. This forces us to initialize the queue for
      each sent message, something costing extra CPU cycles in the most
      critical data path. Later in this series we will introduce a new
      safe input buffer queue, something that would force us to initialize
      even the spinlock of the outgoing queue. A closer analysis reveals
      that the queue always is filled and emptied within the same lock_sock()
      session. It is therefore safe to use a queue aggregated in the socket
      itself for this purpose. Since there already exists a queue for this
      in struct sock, sk_write_queue, we introduce use of that queue in
      this commit.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94153e36
    • J
      tipc: split up function tipc_msg_eval() · e3a77561
      Jon Paul Maloy 提交于
      The function tipc_msg_eval() is in reality doing two related, but
      different tasks. First it tries to find a new destination for named
      messages, in case there was no first lookup, or if the first lookup
      failed. Second, it does what its name suggests, evaluating the validity
      of the message and its destination, and returning an appropriate error
      code depending on the result.
      
      This is confusing, and in this commit we choose to break it up into two
      functions. A new function, tipc_msg_lookup_dest(), first attempts to find
      a new destination, if the message is of the right type. If this lookup
      fails, or if the message should not be subject to a second lookup, the
      already existing tipc_msg_reverse() is called. This function performs
      prepares the message for rejection, if applicable.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3a77561
    • J
      tipc: enqueue arrived buffers in socket in separate function · d570d864
      Jon Paul Maloy 提交于
      The code for enqueuing arriving buffers in the function tipc_sk_rcv()
      contains long code lines and currently goes to two indentation levels.
      As a cosmetic preparaton for the next commits, we break it out into
      a separate function.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d570d864