1. 20 3月, 2015 3 次提交
  2. 19 3月, 2015 2 次提交
  3. 18 3月, 2015 3 次提交
    • Y
      tipc: withdraw tipc topology server name when namespace is deleted · 2b9bb7f3
      Ying Xue 提交于
      The TIPC topology server is a per namespace service associated with the
      tipc name {1, 1}. When a namespace is deleted, that name must be withdrawn
      before we call sk_release_kernel because the kernel socket release is
      done in init_net and trying to withdraw a TIPC name published in another
      namespace will fail with an error as:
      
      [  170.093264] Unable to remove local publication
      [  170.093264] (type=1, lower=1, ref=2184244004, key=2184244005)
      
      We fix this by breaking the association between the topology server name
      and socket before calling sk_release_kernel.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b9bb7f3
    • Y
      tipc: fix a potential deadlock when nametable is purged · 8460504b
      Ying Xue 提交于
      [   28.531768] =============================================
      [   28.532322] [ INFO: possible recursive locking detected ]
      [   28.532322] 3.19.0+ #194 Not tainted
      [   28.532322] ---------------------------------------------
      [   28.532322] insmod/583 is trying to acquire lock:
      [   28.532322]  (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000d219>] tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]
      [   28.532322] but task is already holding lock:
      [   28.532322]  (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000e0dc>] tipc_nametbl_stop+0xfc/0x1f0 [tipc]
      [   28.532322]
      [   28.532322] other info that might help us debug this:
      [   28.532322]  Possible unsafe locking scenario:
      [   28.532322]
      [   28.532322]        CPU0
      [   28.532322]        ----
      [   28.532322]   lock(&(&nseq->lock)->rlock);
      [   28.532322]   lock(&(&nseq->lock)->rlock);
      [   28.532322]
      [   28.532322]  *** DEADLOCK ***
      [   28.532322]
      [   28.532322]  May be due to missing lock nesting notation
      [   28.532322]
      [   28.532322] 3 locks held by insmod/583:
      [   28.532322]  #0:  (net_mutex){+.+.+.}, at: [<ffffffff8163e30f>] register_pernet_subsys+0x1f/0x50
      [   28.532322]  #1:  (&(&tn->nametbl_lock)->rlock){+.....}, at: [<ffffffffa000e091>] tipc_nametbl_stop+0xb1/0x1f0 [tipc]
      [   28.532322]  #2:  (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000e0dc>] tipc_nametbl_stop+0xfc/0x1f0 [tipc]
      [   28.532322]
      [   28.532322] stack backtrace:
      [   28.532322] CPU: 1 PID: 583 Comm: insmod Not tainted 3.19.0+ #194
      [   28.532322] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [   28.532322]  ffffffff82394460 ffff8800144cb928 ffffffff81792f3e 0000000000000007
      [   28.532322]  ffffffff82394460 ffff8800144cba28 ffffffff810a8080 ffff8800144cb998
      [   28.532322]  ffffffff810a4df3 ffff880013e9cb10 ffffffff82b0d330 ffff880013e9cb38
      [   28.532322] Call Trace:
      [   28.532322]  [<ffffffff81792f3e>] dump_stack+0x4c/0x65
      [   28.532322]  [<ffffffff810a8080>] __lock_acquire+0x740/0x1ca0
      [   28.532322]  [<ffffffff810a4df3>] ? __bfs+0x23/0x270
      [   28.532322]  [<ffffffff810a7506>] ? check_irq_usage+0x96/0xe0
      [   28.532322]  [<ffffffff810a8a73>] ? __lock_acquire+0x1133/0x1ca0
      [   28.532322]  [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]  [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
      [   28.532322]  [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]  [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
      [   28.532322]  [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]  [<ffffffffa000d219>] tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]  [<ffffffffa000e11e>] tipc_nametbl_stop+0x13e/0x1f0 [tipc]
      [   28.532322]  [<ffffffffa000dfe5>] ? tipc_nametbl_stop+0x5/0x1f0 [tipc]
      [   28.532322]  [<ffffffffa0004bab>] tipc_init_net+0x13b/0x150 [tipc]
      [   28.532322]  [<ffffffffa0004a75>] ? tipc_init_net+0x5/0x150 [tipc]
      [   28.532322]  [<ffffffff8163dece>] ops_init+0x4e/0x150
      [   28.532322]  [<ffffffff810aa66d>] ? trace_hardirqs_on+0xd/0x10
      [   28.532322]  [<ffffffff8163e1d3>] register_pernet_operations+0xf3/0x190
      [   28.532322]  [<ffffffff8163e31e>] register_pernet_subsys+0x2e/0x50
      [   28.532322]  [<ffffffffa002406a>] tipc_init+0x6a/0x1000 [tipc]
      [   28.532322]  [<ffffffffa0024000>] ? 0xffffffffa0024000
      [   28.532322]  [<ffffffff810002d9>] do_one_initcall+0x89/0x1c0
      [   28.532322]  [<ffffffff811b7cb0>] ? kmem_cache_alloc_trace+0x50/0x1b0
      [   28.532322]  [<ffffffff810e725b>] ? do_init_module+0x2b/0x200
      [   28.532322]  [<ffffffff810e7294>] do_init_module+0x64/0x200
      [   28.532322]  [<ffffffff810e9353>] load_module+0x12f3/0x18e0
      [   28.532322]  [<ffffffff810e5890>] ? show_initstate+0x50/0x50
      [   28.532322]  [<ffffffff810e9a19>] SyS_init_module+0xd9/0x110
      [   28.532322]  [<ffffffff8179f3b3>] sysenter_dispatch+0x7/0x1f
      
      Before tipc_purge_publications() calls tipc_nametbl_remove_publ() to
      remove a publication with a name sequence, the name sequence's lock
      is held. However, when tipc_nametbl_remove_publ() calling
      tipc_nameseq_remove_publ() to remove the publication, it first tries
      to query name sequence instance with the publication, and then holds
      the lock of the found name sequence. But as the lock may be already
      taken in tipc_purge_publications(), deadlock happens like above
      scenario demonstrated. As tipc_nameseq_remove_publ() doesn't grab name
      sequence's lock, the deadlock can be avoided if it's directly invoked
      by tipc_purge_publications().
      
      Fixes: 97ede29e ("tipc: convert name table read-write lock to RCU")
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8460504b
    • Y
      tipc: fix netns refcnt leak · 76100a8a
      Ying Xue 提交于
      When the TIPC module is loaded, we launch a topology server in kernel
      space, which in its turn is creating TIPC sockets for communication
      with topology server users. Because both the socket's creator and
      provider reside in the same module, it is necessary that the TIPC
      module's reference count remains zero after the server is started and
      the socket created; otherwise it becomes impossible to perform "rmmod"
      even on an idle module.
      
      Currently, we achieve this by defining a separate "tipc_proto_kern"
      protocol struct, that is used only for kernel space socket allocations.
      This structure has the "owner" field set to NULL, which restricts the
      module reference count from being be bumped when sk_alloc() for local
      sockets is called. Furthermore, we have defined three kernel-specific
      functions, tipc_sock_create_local(), tipc_sock_release_local() and
      tipc_sock_accept_local(), to avoid the module counter being modified
      when module local sockets are created or deleted. This has worked well
      until we introduced name space support.
      
      However, after name space support was introduced, we have observed that
      a reference count leak occurs, because the netns counter is not
      decremented in tipc_sock_delete_local().
      
      This commit remedies this problem. But instead of just modifying
      tipc_sock_delete_local(), we eliminate the whole parallel socket
      handling infrastructure, and start using the regular sk_create_kern(),
      kernel_accept() and sk_release_kernel() calls. Since those functions
      manipulate the module counter, we must now compensate for that by
      explicitly decrementing the counter after module local sockets are
      created, and increment it just before calling sk_release_kernel().
      
      Fixes: a62fbcce ("tipc: make subscriber server support net namespace")
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reported-by: NCong Wang <cwang@twopensource.com>
      Tested-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76100a8a
  4. 15 3月, 2015 7 次提交
    • J
      tipc: clean up handling of message priorities · e3eea1eb
      Jon Paul Maloy 提交于
      Messages transferred by TIPC are assigned an "importance priority", -an
      integer value indicating how to treat the message when there is link or
      destination socket congestion.
      
      There is no separate header field for this value. Instead, the message
      user values have been chosen in ascending order according to perceived
      importance, so that the message user field can be used for this.
      
      This is not a good solution. First, we have many more users than the
      needed priority levels, so we end up with treating more priority
      levels than necessary. Second, the user field cannot always
      accurately reflect the priority of the message. E.g., a message
      fragment packet should really have the priority of the enveloped
      user data message, and not the priority of the MSG_FRAGMENTER user.
      Until now, we have been working around this problem in different ways,
      but it is now time to implement a consistent way of handling such
      priorities, although still within the constraint that we cannot
      allocate any more bits in the regular data message header for this.
      
      In this commit, we define a new priority level, TIPC_SYSTEM_IMPORTANCE,
      that will be the only one used apart from the four (lower) user data
      levels. All non-data messages map down to this priority. Furthermore,
      we take some free bits from the MSG_FRAGMENTER header and allocate
      them to store the priority of the enveloped message. We then adjust
      the functions msg_importance()/msg_set_importance() so that they
      read/set the correct header fields depending on user type.
      
      This small protocol change is fully compatible, because the code at
      the receiving end of a link currently reads the importance level
      only from user data messages, where there is no change.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3eea1eb
    • J
      tipc: split link outqueue · 05dcc5aa
      Jon Paul Maloy 提交于
      struct tipc_link contains one single queue for outgoing packets,
      where both transmitted and waiting packets are queued.
      
      This infrastructure is hard to maintain, because we need
      to keep a number of fields to keep track of which packets are
      sent or unsent, and the number of packets in each category.
      
      A lot of code becomes simpler if we split this queue into a transmission
      queue, where sent/unacknowledged packets are kept, and a backlog queue,
      where we keep the not yet sent packets.
      
      In this commit we do this separation.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05dcc5aa
    • J
      tipc: eliminate unnecessary call to broadcast ack function · 2cdf3918
      Jon Paul Maloy 提交于
      The unicast packet header contains a broadcast acknowledge sequence
      number, that may need to be conveyed to the broadcast link for proper
      treatment. Currently, the function tipc_rcv(), which is on the most
      critical data path, calls the function tipc_bclink_acknowledge() to
      have this done. This call is made for each received packet, and results
      in the unconditional grabbing of the broadcast link spinlock.
      
      This is unnecessary, since we can see directly from tipc_rcv() if
      the acknowledged number differs from what has been previously acked
      from the node in question. In the vast majority of cases the numbers
      won't differ, and there is nothing to update.
      
      We now make the call to tipc_bclink_acknowledge() conditional
      to that the two ack values differ.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2cdf3918
    • J
      tipc: extract bundled buffers by cloning instead of copying · c1336ee4
      Jon Paul Maloy 提交于
      When we currently extract a bundled buffer from a message bundle in
      the function tipc_msg_extract(), we allocate a new buffer and explicitly
      copy the linear data area.
      
      This is unnecessary, since we can just clone the buffer and do
      skb_pull() on the clone to move the data pointer to the correct
      position.
      
      This is what we do in this commit.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1336ee4
    • J
      tipc: eliminate unnecessary linearization of incoming buffers · 1149557d
      Jon Paul Maloy 提交于
      Currently, TIPC linearizes all incoming buffers directly at reception
      before passing them upwards in the stack. This is clearly a waste of
      CPU resources, and must be avoided.
      
      In this commit, we eliminate this unnecessary linearization. We still
      ensure that at least the message header is linear, and that the buffer
      is linearized where this is still needed, i.e. when unbundling and when
      reversing messages.
      
      In addition, we ensure that fragmented messages are validated after
      reassembly before delivering them upwards in the stack.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1149557d
    • J
      tipc: move message validation function to msg.c · cf2157f8
      Jon Paul Maloy 提交于
      The function link_buf_validate() is in reality re-entrant and context
      independent, and will in later commits be called from several locations.
      Therefore, we move it to msg.c, make it outline and rename the it to
      tipc_msg_validate().
      
      We also redesign the function to make proper use of pskb_may_pull()
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf2157f8
    • J
      tipc: add framework for node capabilities exchange · 7764d6e8
      Jon Paul Maloy 提交于
      The TIPC protocol spec has defined a 13 bit capability bitmap in
      the neighbor discovery header, as a means to maintain compatibility
      between different code and protocol generations. Until now this field
      has been unused.
      
      We now introduce the basic framework for exchanging capabilities
      between nodes at first contact. After exchange, a peer node's
      capabilities are stored as a 16 bit bitmap in struct tipc_node.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7764d6e8
  5. 11 3月, 2015 1 次提交
    • J
      tipc: ensure that idle links are deleted when a bearer is disabled · 169bf912
      Jon Paul Maloy 提交于
      commit afaa3f65
      (tipc: purge links when bearer is disabled) was an attempt to resolve
      a problem that turned out to have a more profound reason.
      
      When we disable a bearer, we delete all its pertaining links if
      there is no other bearer to perform failover to, or if the module
      is shutting down. In case there are dual bearers, we wait with
      deleting links until the failover procedure is finished.
      
      However, this misses the case when a link on the removed bearer
      was already down, so that there will be no failover procedure to
      finish the link delete. This causes confusion if a new bearer is
      added to replace the removed one, and also entails a small memory
      leak.
      
      This commit takes the current state of the link into account when
      deciding when to delete it, and also reverses the above-mentioned
      commit.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      169bf912
  6. 10 3月, 2015 3 次提交
    • J
      tipc: fix bug in link failover handling · e6441bae
      Jon Paul Maloy 提交于
      In commit c637c103
      ("tipc: resolve race problem at unicast message reception") we
      introduced a new mechanism for delivering buffers upwards from link
      to socket layer.
      
      That code contains a bug in how we handle the new link input queue
      during failover. When a link is reset, some of its users may be blocked
      because of congestion, and in order to resolve this, we add any pending
      wakeup pseudo messages to the link's input queue, and deliver them to
      the socket. This misses the case where the other, remaining link also
      may have congested users. Currently, the owner node's reference to the
      remaining link's input queue is unconditionally overwritten by the
      reset link's input queue. This has the effect that wakeup events from
      the remaining link may be unduely delayed (but not lost) for a
      potentially long period.
      
      We fix this by adding the pending events from the reset link to the
      input queue that is currently referenced by the node, whichever one
      it is.
      
      This commit should be applied to both net and net-next.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6441bae
    • E
      tipc: fix inconsistent signal handling regression · 143fe22f
      Erik Hugne 提交于
      Commit 9bbb4ecc ("tipc: standardize recvmsg routine") changed
      the sleep/wakeup behaviour for sockets entering recv() or accept().
      In this process the order of reporting -EAGAIN/-EINTR was reversed.
      This caused problems with wrong errno being reported back if the
      timeout expires. The same problem happens if the socket is
      nonblocking and recv()/accept() is called when the process have
      pending signals. If there is no pending data read or connections to
      accept, -EINTR will be returned instead of -EAGAIN.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Reported-by László Benedek <laszlo.benedek@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      143fe22f
    • E
      tipc: sparse: fix htons conversion warnings · 4fee6be8
      Erik Hugne 提交于
      Commit d0f91938 ("tipc: add ip/udp media type") introduced
      some new sparse warnings. Clean them up.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fee6be8
  7. 06 3月, 2015 2 次提交
  8. 03 3月, 2015 2 次提交
  9. 28 2月, 2015 6 次提交
  10. 10 2月, 2015 11 次提交