1. 08 4月, 2013 1 次提交
    • M
      tipc: fix info leaks via msg_name in recv_msg/recv_stream · 60085c3d
      Mathias Krause 提交于
      The code in set_orig_addr() does not initialize all of the members of
      struct sockaddr_tipc when filling the sockaddr info -- namely the union
      is only partly filled. This will make recv_msg() and recv_stream() --
      the only users of this function -- leak kernel stack memory as the
      msg_name member is a local variable in net/socket.c.
      
      Additionally to that both recv_msg() and recv_stream() fail to update
      the msg_namelen member to 0 while otherwise returning with 0, i.e.
      "success". This is the case for, e.g., non-blocking sockets. This will
      lead to a 128 byte kernel stack leak in net/socket.c.
      
      Fix the first issue by initializing the memory of the union with
      memset(0). Fix the second one by setting msg_namelen to 0 early as it
      will be updated later if we're going to fill the msg_name member.
      
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Allan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60085c3d
  2. 28 2月, 2013 1 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
  3. 16 2月, 2013 4 次提交
  4. 12 1月, 2013 1 次提交
  5. 08 12月, 2012 8 次提交
    • P
      tipc: refactor accept() code for improved readability · 0fef8f20
      Paul Gortmaker 提交于
      In TIPC's accept() routine, there is a large block of code relating
      to initialization of a new socket, all within an if condition checking
      if the allocation succeeded.
      
      Here, we simply flip the check of the if, so that the main execution
      path stays at the same indentation level, which improves readability.
      If the allocation fails, we jump to an already existing exit label.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      0fef8f20
    • Y
      tipc: add lock nesting notation to quiet lockdep warning · 258f8667
      Ying Xue 提交于
      TIPC accept() call grabs the socket lock on a newly allocated
      socket while holding the socket lock on an old socket. But lockdep
      worries that this might be a recursive lock attempt:
      
        [ INFO: possible recursive locking detected ]
        ---------------------------------------------
        kworker/u:0/6 is trying to acquire lock:
        (sk_lock-AF_TIPC){+.+.+.}, at: [<c8c1226c>] accept+0x15c/0x310 [tipc]
      
        but task is already holding lock:
        (sk_lock-AF_TIPC){+.+.+.}, at: [<c8c12138>] accept+0x28/0x310 [tipc]
      
        other info that might help us debug this:
        Possible unsafe locking scenario:
      
                CPU0
                ----
                lock(sk_lock-AF_TIPC);
                lock(sk_lock-AF_TIPC);
      
                *** DEADLOCK ***
      
        May be due to missing lock nesting notation
        [...]
      
      Tell lockdep that this locking is safe by using lock_sock_nested().
      This is similar to what was done in commit 5131a184 for
      SCTP code ("SCTP: lock_sock_nested in sctp_sock_migrate").
      
      Also note that this is isn't something that is seen normally,
      as it was uncovered with some experimental work-in-progress
      code not yet ready for mainline.  So no need for stable
      backports or similar of this commit.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      258f8667
    • Y
      tipc: eliminate connection setup for implied connect in recv_msg() · cbab3687
      Ying Xue 提交于
      As connection setup is now completed asynchronously in BH context,
      in the function filter_connect(), the corresponding code in recv_msg()
      becomes redundant.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      cbab3687
    • Y
      tipc: introduce non-blocking socket connect · 584d24b3
      Ying Xue 提交于
      TIPC has so far only supported blocking connect(), meaning that a call
      to connect() doesn't return until either the connection is fully
      established, or an error occurs. This has proved insufficient for many
      users, so we now introduce non-blocking connect(), analogous to how
      this is done in TCP and other protocols.
      
      With this feature, if a connection cannot be established instantly,
      connect() will return the error code "-EINPROGRESS".
      If the user later calls connect() again, he will either have the
      return code "-EALREADY" or "-EISCONN", depending on whether the
      connection has been established or not.
      
      The user must have explicitly set the socket to be non-blocking
      (SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
      for some reason they had set this already (the socket would anyway
      remain blocking in current TIPC) this change should be completely
      backwards compatible.
      
      It is also now possible to call select() or poll() to wait for the
      completion of a connection.
      
      An effect of the above is that the actual completion of a connection
      may now be performed asynchronously, independent of the calls from
      user space. Therefore, we now execute this code in BH context, in
      the function filter_rcv(), which is executed upon reception of
      messages in the socket.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      [PG: minor refactoring for improved connect/disconnect function names]
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      584d24b3
    • Y
      tipc: consolidate connection-oriented message reception in one function · 7e6c131e
      Ying Xue 提交于
      Handling of connection-related message reception is currently scattered
      around at different places in the code. This makes it harder to verify
      that things are handled correctly in all possible scenarios.
      So we consolidate the existing processing of connection-oriented
      message reception in a single routine.  In the process, we convert the
      chain of if/else into a switch/case for improved readability.
      
      A cast on the socket_state in the switch is needed to avoid compile
      warnings on 32 bit, like "net/tipc/socket.c:1252:2: warning: case value
      ‘4294967295’ not in enumerated type".  This happens because existing
      tipc code pseudo extends the default linux socket state values with:
      
      	#define SS_LISTENING    -1      /* socket is listening */
      	#define SS_READY        -2      /* socket is connectionless */
      
      It may make sense to add these as _positive_ values to the existing
      socket state enum list someday, vs. these already existing defines.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      [PG: add cast to fix warning; remove returns from middle of switch]
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      7e6c131e
    • P
      tipc: standardize across connect/disconnect function naming · bc879117
      Paul Gortmaker 提交于
      Currently we have tipc_disconnect and tipc_disconnect_port.  It is
      not clear from the names alone, what they do or how they differ.
      It turns out that tipc_disconnect just deals with the port locking
      and then calls tipc_disconnect_port which does all the work.
      
      If we rename as follows: tipc_disconnect_port --> __tipc_disconnect
      then we will be following typical linux convention, where:
      
         __tipc_disconnect: "raw" function that does all the work.
      
         tipc_disconnect: wrapper that deals with locking and then calls
      		    the real core __tipc_disconnect function
      
      With this, the difference is immediately evident, and locking
      violations are more apt to be spotted by chance while working on,
      or even just while reading the code.
      
      On the connect side of things, we currently only have the single
      "tipc_connect2port" function.  It does both the locking at enter/exit,
      and the core of the work.  Pending changes will make it desireable to
      have the connect be a two part locking wrapper + worker function,
      just like the disconnect is already.
      
      Here, we make the connect look just like the updated disconnect case,
      for the above reason, and for consistency.  In the process, we also
      get rid of the "2port" suffix that was on the original name, since
      it adds no descriptive value.
      
      On close examination, one might notice that the above connect
      changes implicitly move the call to tipc_link_get_max_pkt() to be
      within the scope of tipc_port_lock() protected region; when it was
      not previously.  We don't see any issues with this, and it is in
      keeping with __tipc_connect doing the work and tipc_connect just
      handling the locking.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      bc879117
    • J
      tipc: change sk_receive_queue upper limit · e643df15
      Jon Maloy 提交于
      The sk_recv_queue upper limit for connectionless sockets has empirically
      turned out to be too low. When we double the current limit we get much
      fewer rejected messages and no noticable negative side-effects.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      e643df15
    • Y
      tipc: eliminate aggregate sk_receive_queue limit · 9da3d475
      Ying Xue 提交于
      As a complement to the per-socket sk_recv_queue limit, TIPC keeps a
      global atomic counter for the sum of sk_recv_queue sizes across all
      tipc sockets. When incremented, the counter is compared to an upper
      threshold value, and if this is reached, the message is rejected
      with error code TIPC_OVERLOAD.
      
      This check was originally meant to protect the node against
      buffer exhaustion and general CPU overload. However, all experience
      indicates that the feature not only is redundant on Linux, but even
      harmful. Users run into the limit very often, causing disturbances
      for their applications, while removing it seems to have no negative
      effects at all. We have also seen that overall performance is
      boosted significantly when this bottleneck is removed.
      
      Furthermore, we don't see any other network protocols maintaining
      such a mechanism, something strengthening our conviction that this
      control can be eliminated.
      
      As a result, the atomic variable tipc_queue_size is now unused
      and so it can be deleted.  There is a getsockopt call that used
      to allow reading it; we retain that but just return zero for
      maximum compatibility.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      [PG: phase out tipc_queue_size as pointed out by Neil Horman]
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      9da3d475
  6. 07 12月, 2012 1 次提交
    • E
      tipc: remove obsolete flush of stale reassembly buffer · c0084138
      Erik Hugne 提交于
      Each link instance has a periodic job checking if there is a stale
      ongoing message reassembly associated to the link. If no new
      fragment has been received during the last 4*[link_tolerance] period,
      it is assumed the missing fragment will never arrive. As a consequence,
      the reassembly buffer is discarded, and a gap in the message sequence
      occurs.
      
      This assumption is wrong. After we abandoned our ambition to develop
      packet routing for multi-cluster networks, only single-hop packet
      transfer remains as an option. For those, all packets are guaranteed
      to be delivered in sequence to the defragmentation layer. Any failure
      to achieve sequenced delivery will eventually lead to link reset, and
      the reassembly buffer will be flushed anyway.
      
      So we just remove this periodic check, which is now obsolete.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      [PG: also delete get/inc_timer count, since they are now unused]
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      c0084138
  7. 23 11月, 2012 3 次提交
    • P
      tipc: delete TIPC_ADVANCED Kconfig variable · 94fc9c47
      Paul Gortmaker 提交于
      There used to be a time when TIPC had lots of Kconfig knobs the
      end user could alter, but they have all been made automatic or
      obsolete, with the exception of CONFIG_TIPC_PORTS.  This
      previously existing set of options was all hidden under the
      TIPC_ADVANCED setting, which does not exist in any code, but
      only in Kconfig scope.
      
      Having this now, just to hide the one remaining "advanced"
      option no longer makes sense.  Remove it.  Also get rid of the
      ifdeffery in the TIPC code that allowed for TIPC_PORTS to be
      possibly undefined.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      94fc9c47
    • Y
      tipc: eliminate an unnecessary cast of node variable · 4cb7d55a
      Ying Xue 提交于
      As the variable:node is currently defined to u32 type, it is
      unnecessary to cast its type to u32 again when using it.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      4cb7d55a
    • J
      tipc: introduce message to synchronize broadcast link · c64f7a6a
      Jon Maloy 提交于
      Upon establishing a first link between two nodes, there is
      currently a risk that the two endpoints will disagree on exactly
      which sequence number reception and acknowleding of broadcast
      packets should start.
      
      The following scenarios may happen:
      
      1: Node A sends an ACTIVATE message to B, telling it to start acking
         packets from sequence number N.
      2: Node A sends out broadcast N, but does not expect an acknowledge
         from B, since B is not yet in its broadcast receiver's list.
      3: Node A receives ACK for N from all nodes except B, and releases
         packet N.
      4: Node B receives the ACTIVATE, activates its link endpoint, and
         stores the value N as sequence number of first expected packet.
      5: Node B sends a NAME_DISTR message to A.
      6: Node A receives the NAME_DISTR message, and activates its endpoint.
         At this moment B is added to A's broadcast receiver's set.
         Node A also sets sequence number 0 as the first broadcast packet
         to be received from B.
      7: Node A sends broadcast N+1.
      8: B receives N+1, determines there is a gap in the sequence, since
         it is expecting N, and sends a NACK for N back to A.
      9: Node A has already released N, so no retransmission is possible.
         The broadcast link in direction A->B is stale.
      
      In addition to, or instead of, 7-9 above, the following may happen:
      
      10: Node B sends broadcast M > 0 to A.
      11: Node A receives M, falsely decides there must be a gap, since
          it is expecting packet 0, and asks for retransmission of packets
          [0,M-1].
      12: Node B has already released these packets, so the broadcast
          link is stale in direction B->A.
      
      We solve this problem by introducing a new unicast message type,
      BCAST_PROTOCOL/STATE, to convey the sequence number of the next
      sent broadcast packet to the other endpoint, at exactly the moment
      that endpoint is added to the own node's broadcast receivers list,
      and before any other unicast messages are permitted to be sent.
      
      Furthermore, we don't allow any node to start receiving and
      processing broadcast packets until this new synchronization
      message has been received.
      
      To maintain backwards compatibility, we still open up for
      broadcast reception if we receive a NAME_DISTR message without
      any preceding broadcast sync message. In this case, we must
      assume that the other end has an older code version, and will
      never send out the new synchronization message. Hence, for mixed
      old and new nodes, the issue arising in 7-12 of the above may
      happen with the same probability as before.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      c64f7a6a
  8. 22 11月, 2012 6 次提交
    • Y
      tipc: rename supported flag to recv_permitted · 389dd9bc
      Ying Xue 提交于
      Rename the "supported" flag in bclink structure to "recv_permitted"
      to better reflect what it is used for. When this flag is set for a
      given node, we are permitted to receive and acknowledge broadcast
      messages from that node.  Convert it to a bool at the same time,
      since it is not used to store any numerical values.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      389dd9bc
    • Y
      tipc: remove supportable flag from bclink structure · 818f4da5
      Ying Xue 提交于
      The "supportable" flag in bclink structure is a compatibility flag
      indicating whether a peer node is capable of receiving TIPC broadcast
      messages. However, all TIPC versions since tipc-1.5, and after the
      inclusion in the upstream Linux kernel in 2006, support this capability.
      It is highly unlikely that anybody is still using such an old
      version of TIPC, let alone that they want to mix it with TIPC-2.0
      nodes. Therefore, we now remove the "supportable" flag.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      818f4da5
    • Y
      tipc: remove the bearer congestion mechanism · 3c294cb3
      Ying Xue 提交于
      Currently at the TIPC bearer layer there is the following congestion
      mechanism:
      
      Once sending packets has failed via that bearer, the bearer will be
      flagged as being in congested state at once. During bearer congestion,
      all packets arriving at link will be queued on the link's outgoing
      buffer.  When we detect that the state of bearer congestion has
      relaxed (e.g. some packets are received from the bearer) we will try
      our best to push all packets in the link's outgoing buffer until the
      buffer is empty, or until the bearer is congested again.
      
      However, in fact the TIPC bearer never receives any feedback from the
      device layer whether a send was successful or not, so it must always
      assume it was successful. Therefore, the bearer congestion mechanism
      as it exists currently is of no value.
      
      But the bearer blocking state is still useful for us. For example,
      when the physical media goes down/up, we need to change the state of
      the links bound to the bearer.  So the code maintaing the state
      information is not removed.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      3c294cb3
    • Y
      tipc: wake up all waiting threads at socket shutdown · 75031151
      Ying Xue 提交于
      When a socket is shut down, we should wake up all thread sleeping on
      it, instead of just one of them. Otherwise, when several threads are
      polling the same socket, and one of them does shutdown(), the
      remaining threads may end up sleeping forever.
      
      Also, to align socket usage with common practice in other stacks, we
      use one of the common socket callback handlers, sk_state_change(),
      to wake up pending users. This is similar to the usage in e.g.
      inet_shutdown(). [net/ipv4/af_inet.c].
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      75031151
    • E
      tipc: return POLLOUT for sockets in an unconnected state · c4fc298a
      Erik Hugne 提交于
      If an implied connect is attempted on a nonblocking STREAM/SEQPACKET
      socket during link congestion, the connect message will be discarded
      and sendmsg will return EAGAIN. This is normal behavior, and the
      application is expected to poll the socket until POLLOUT is set,
      after which the connection attempt can be retried.
      However, the POLLOUT flag is never set for unconnected sockets and
      poll() always returns a zero mask. The application is then left without
      a trigger for when it can make another attempt at sending the message.
      
      The solution is to check if we're polling on an unconnected socket
      and set the POLLOUT flag if the TIPC port owned by this socket
      is not congested. The TIPC ports waiting on a specific link will be
      marked as 'not congested' when the link congestion have abated.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      c4fc298a
    • Y
      tipc: fix race/inefficiencies in poll/wait behaviour · f288bef4
      Ying Xue 提交于
      When an application blocks at poll/select on a TIPC socket
      while requesting a specific event mask, both the filter_rcv() and
      wakeupdispatch() case will wake it up unconditionally whenever
      the state changes (i.e an incoming message arrives, or congestion
      has subsided).  No mask is used.
      
      To avoid this, we populate sk->sk_data_ready and sk->sk_write_space
      with tipc_data_ready and tipc_write_space respectively, which makes
      tipc more in alignment with the rest of the networking code.  These
      pass the exact set of possible events to the waker in fs/select.c
      hence avoiding waking up blocked processes unnecessarily.
      
      In doing so, we uncover another issue -- that there needs to be a
      memory barrier in these poll/receive callbacks, otherwise we are
      subject to the the same race as documented above wq_has_sleeper()
      [in commit a57de0b4 "net: adding memory barrier to the poll and
      receive callbacks"].  So we need to replace poll_wait() with
      sock_poll_wait() and use rcu protection for the sk->sk_wq pointer
      in these two new functions.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      f288bef4
  9. 04 11月, 2012 1 次提交
  10. 05 10月, 2012 1 次提交
    • E
      tipc: prevent dropped connections due to rcvbuf overflow · e57edf6b
      Erik Hugne 提交于
      When large buffers are sent over connected TIPC sockets, it
      is likely that the sk_backlog will be filled up on the
      receiver side, but the TIPC flow control mechanism is happily
      unaware of this since that is based on message count.
      
      The sender will receive a TIPC_ERR_OVERLOAD message when this occurs
      and drop it's side of the connection, leaving it stale on
      the receiver end.
      
      By increasing the sk_rcvbuf to a 'worst case' value, we avoid the
      overload caused by a full backlog queue and the flow control
      will work properly.
      
      This worst case value is the max TIPC message size times
      the flow control window, multiplied by two because a sender
      will transmit up to double the window size before a port is marked
      congested.
      We multiply this by 2 to account for the sk_buff and other overheads.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e57edf6b
  11. 19 9月, 2012 1 次提交
  12. 11 9月, 2012 1 次提交
  13. 20 8月, 2012 9 次提交
  14. 14 7月, 2012 2 次提交
    • E
      tipc: remove print_buf and deprecated log buffer code · 869dd466
      Erik Hugne 提交于
      The internal log buffer handling functions can now safely be
      removed since there is no code using it anymore.  Requests to
      interact with the internal tipc log buffer over netlink (in
      config.c) will report 'obsolete command'.
      
      This represents the final removal of any references to a
      struct print_buf, and the removal of the struct itself.
      We also get rid of a TIPC specific Kconfig in the process.
      
      Finally, log.h is removed since it is not needed anymore.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      869dd466
    • E
      tipc: phase out most of the struct print_buf usage · dc1aed37
      Erik Hugne 提交于
      The tipc_printf is renamed to tipc_snprintf, as the new name
      describes more what the function actually does.  It is also
      changed to take a buffer and length parameter and return
      number of characters written to the buffer.  All callers of
      this function that used to pass a print_buf are updated.
      
      Final removal of the struct print_buf itself will be done
      synchronously with the pending removal of the deprecated
      logging code that also was using it.
      
      Functions that build up a response message with a list of
      ports, nametable contents etc. are changed to return the number
      of characters written to the output buffer. This information
      was previously hidden in a field of the print_buf struct, and
      the number of chars written was fetched with a call to
      tipc_printbuf_validate.  This function is removed since it
      is no longer referenced nor needed.
      
      A generic max size ULTRA_STRING_MAX_LEN is defined, named
      in keeping with the existing TIPC_TLV_ULTRA_STRING, and the
      various definitions in port, link and nametable code that
      largely duplicated this information are removed.  This means
      that amount of link statistics that can be returned is now
      increased from 2k to 32k.
      
      The buffer overflow check is now done just before the reply
      message is passed over netlink or TIPC to a remote node and
      the message indicating a truncated buffer is changed to a less
      dramatic one (less CAPS), placed at the end of the message.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      dc1aed37