1. 06 2月, 2015 2 次提交
    • J
      tipc: simplify socket multicast reception · 3c724acd
      Jon Paul Maloy 提交于
      The structure 'tipc_port_list' is used to collect port numbers
      representing multicast destination socket on a receiving node.
      The list is not based on a standard linked list, and is in reality
      optimized for the uncommon case that there are more than one
      multicast destinations per node. This makes the list handling
      unecessarily complex, and as a consequence, even the socket
      multicast reception becomes more complex.
      
      In this commit, we replace 'tipc_port_list' with a new 'struct
      tipc_plist', which is based on a standard list. We give the new
      list stack (push/pop) semantics, someting that simplifies
      the implementation of the function tipc_sk_mcast_rcv().
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c724acd
    • J
      tipc: resolve race problem at unicast message reception · c637c103
      Jon Paul Maloy 提交于
      TIPC handles message cardinality and sequencing at the link layer,
      before passing messages upwards to the destination sockets. During the
      upcall from link to socket no locks are held. It is therefore possible,
      and we see it happen occasionally, that messages arriving in different
      threads and delivered in sequence still bypass each other before they
      reach the destination socket. This must not happen, since it violates
      the sequentiality guarantee.
      
      We solve this by adding a new input buffer queue to the link structure.
      Arriving messages are added safely to the tail of that queue by the
      link, while the head of the queue is consumed, also safely, by the
      receiving socket. Sequentiality is secured per socket by only allowing
      buffers to be dequeued inside the socket lock. Since there may be multiple
      simultaneous readers of the queue, we use a 'filter' parameter to reduce
      the risk that they peek the same buffer from the queue, hence also
      reducing the risk of contention on the receiving socket locks.
      
      This solves the sequentiality problem, and seems to cause no measurable
      performance degradation.
      
      A nice side effect of this change is that lock handling in the functions
      tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
      will enable future simplifications of those functions.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c637c103
  2. 13 1月, 2015 4 次提交
  3. 09 1月, 2015 1 次提交
    • Y
      tipc: convert tipc reference table to use generic rhashtable · 07f6c4bc
      Ying Xue 提交于
      As tipc reference table is statically allocated, its memory size
      requested on stack initialization stage is quite big even if the
      maximum port number is just restricted to 8191 currently, however,
      the number already becomes insufficient in practice. But if the
      maximum ports is allowed to its theory value - 2^32, its consumed
      memory size will reach a ridiculously unacceptable value. Apart from
      this, heavy tipc users spend a considerable amount of time in
      tipc_sk_get() due to the read-lock on ref_table_lock.
      
      If tipc reference table is converted with generic rhashtable, above
      mentioned both disadvantages would be resolved respectively: making
      use of the new resizable hash table can avoid locking on the lookup;
      smaller memory size is required at initial stage, for example, 256
      hash bucket slots are requested at the beginning phase instead of
      allocating the entire 8191 slots in old mode. The hash table will
      grow if entries exceeds 75% of table size up to a total table size
      of 1M, and it will automatically shrink if usage falls below 30%,
      but the minimum table size is allowed down to 256.
      
      Also converts ref_table_lock to a separate mutex to protect hash table
      mutations on write side. Lastly defers the release of the socket
      reference using call_rcu() to allow using an RCU read-side protected
      call to rhashtable_lookup().
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NErik Hugne <erik.hugne@ericsson.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07f6c4bc
  4. 22 11月, 2014 2 次提交
  5. 24 8月, 2014 5 次提交
  6. 17 7月, 2014 1 次提交
  7. 28 6月, 2014 2 次提交
    • J
      tipc: simplify connection congestion handling · 60120526
      Jon Paul Maloy 提交于
      As a consequence of the recently introduced serialized access
      to the socket in commit 8d94168a761819d10252bab1f8de6d7b202c3baa
      ("tipc: same receive code path for connection protocol and data
      messages") we can make a number of simplifications in the
      detection and handling of connection congestion situations.
      
      - We don't need to keep two counters, one for sent messages and one
        for acked messages. There is no longer any risk for races between
        acknowledge messages arriving in BH and data message sending
        running in user context. So we merge this into one counter,
        'sent_unacked', which is incremented at sending and subtracted
        from at acknowledge reception.
      
      - We don't need to set the 'congested' field in tipc_port to
        true before we sent the message, and clear it when sending
        is successful. (As a matter of fact, it was never necessary;
        the field was set in link_schedule_port() before any wakeup
        could arrive anyway.)
      
      - We keep the conditions for link congestion and connection connection
        congestion separated. There would otherwise be a risk that an arriving
        acknowledge message may wake up a user sleeping because of link
        congestion.
      
      - We can simplify reception of acknowledge messages.
      
      We also make some cosmetic/structural changes:
      
      - We rename the 'congested' field to the more correct 'link_cong´.
      
      - We rename 'conn_unacked' to 'rcv_unacked'
      
      - We move the above mentioned fields from struct tipc_port to
        struct tipc_sock.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60120526
    • J
      tipc: clean up connection protocol reception function · ac0074ee
      Jon Paul Maloy 提交于
      We simplify the code for receiving connection probes, leveraging the
      recently introduced tipc_msg_reverse() function. We also stick to
      the principle of sending a possible response message directly from
      the calling (tipc_sk_rcv or backlog_rcv) functions, hence making
      the call chain shallower and easier to follow.
      
      We make one small protocol change here, allowed according to
      the spec. If a protocol message arrives from a remote socket that
      is not the one we are connected to, we are currently generating a
      connection abort message and send it to the source. This behavior
      is unnecessary, and might even be a security risk, so instead we
      now choose to only ignore the message. The consequnce for the sender
      is that he will need longer time to discover his mistake (until the
      next timeout), but this is an extreme corner case, and may happen
      anyway under other circumstances, so we deem this change acceptable.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac0074ee
  8. 15 5月, 2014 2 次提交
    • J
      tipc: merge port message reception into socket reception function · 9816f061
      Jon Paul Maloy 提交于
      In order to reduce complexity and save a call level during message
      reception at port/socket level, we remove the function tipc_port_rcv()
      and merge its functionality into tipc_sk_rcv().
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9816f061
    • J
      tipc: compensate for double accounting in socket rcv buffer · 4f4482dc
      Jon Paul Maloy 提交于
      The function net/core/sock.c::__release_sock() runs a tight loop
      to move buffers from the socket backlog queue to the receive queue.
      
      As a security measure, sk_backlog.len of the receiving socket
      is not set to zero until after the loop is finished, i.e., until
      the whole backlog queue has been transferred to the receive queue.
      During this transfer, the data that has already been moved is counted
      both in the backlog queue and the receive queue, hence giving an
      incorrect picture of the available queue space for new arriving buffers.
      
      This leads to unnecessary rejection of buffers by sk_add_backlog(),
      which in TIPC leads to unnecessarily broken connections.
      
      In this commit, we compensate for this double accounting by adding
      a counter that keeps track of it. The function socket.c::backlog_rcv()
      receives buffers one by one from __release_sock(), and adds them to the
      socket receive queue. If the transfer is successful, it increases a new
      atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
      transferred buffer. If a new buffer arrives during this transfer and
      finds the socket busy (owned), we attempt to add it to the backlog.
      However, when sk_add_backlog() is called, we adjust the 'limit'
      parameter with the value of the new counter, so that the risk of
      inadvertent rejection is eliminated.
      
      It should be noted that this change does not invalidate the original
      purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
      upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
      doesn't respect the send window) keeps pumping in buffers to
      sk_add_backlog(), he will eventually reach an upper limit,
      (2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
      to the backlog, and the connection will be broken. Ordinary, well-
      behaved senders will never reach this buffer limit at all.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f4482dc
  9. 13 3月, 2014 3 次提交
  10. 18 6月, 2013 1 次提交
    • Y
      tipc: change socket buffer overflow control to respect sk_rcvbuf · cc79dd1b
      Ying Xue 提交于
      As per feedback from the netdev community, we change the buffer
      overflow protection algorithm in receiving sockets so that it
      always respects the nominal upper limit set in sk_rcvbuf.
      
      Instead of scaling up from a small sk_rcvbuf value, which leads to
      violation of the configured sk_rcvbuf limit, we now calculate the
      weighted per-message limit by scaling down from a much bigger value,
      still in the same field, according to the importance priority of the
      received message.
      
      To allow for administrative tunability of the socket receive buffer
      size, we create a tipc_rmem sysctl variable to allow the user to
      configure an even bigger value via sysctl command.  It is a size of
      three (min/default/max) to be consistent with things like tcp_rmem.
      
      By default, the value initialized in tipc_rmem[1] is equal to the
      receive socket size needed by a TIPC_CRITICAL_IMPORTANCE message.
      This value is also set as the default value of sk_rcvbuf.
      Originally-by: NJon Maloy <jon.maloy@ericsson.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      [Ying: added sysctl variation to Jon's original patch]
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      [PG: don't compile sysctl.c if not config'd; add Documentation]
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc79dd1b
  11. 14 7月, 2012 3 次提交
    • E
      tipc: remove print_buf and deprecated log buffer code · 869dd466
      Erik Hugne 提交于
      The internal log buffer handling functions can now safely be
      removed since there is no code using it anymore.  Requests to
      interact with the internal tipc log buffer over netlink (in
      config.c) will report 'obsolete command'.
      
      This represents the final removal of any references to a
      struct print_buf, and the removal of the struct itself.
      We also get rid of a TIPC specific Kconfig in the process.
      
      Finally, log.h is removed since it is not needed anymore.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      869dd466
    • E
      tipc: phase out most of the struct print_buf usage · dc1aed37
      Erik Hugne 提交于
      The tipc_printf is renamed to tipc_snprintf, as the new name
      describes more what the function actually does.  It is also
      changed to take a buffer and length parameter and return
      number of characters written to the buffer.  All callers of
      this function that used to pass a print_buf are updated.
      
      Final removal of the struct print_buf itself will be done
      synchronously with the pending removal of the deprecated
      logging code that also was using it.
      
      Functions that build up a response message with a list of
      ports, nametable contents etc. are changed to return the number
      of characters written to the output buffer. This information
      was previously hidden in a field of the print_buf struct, and
      the number of chars written was fetched with a call to
      tipc_printbuf_validate.  This function is removed since it
      is no longer referenced nor needed.
      
      A generic max size ULTRA_STRING_MAX_LEN is defined, named
      in keeping with the existing TIPC_TLV_ULTRA_STRING, and the
      various definitions in port, link and nametable code that
      largely duplicated this information are removed.  This means
      that amount of link statistics that can be returned is now
      increased from 2k to 32k.
      
      The buffer overflow check is now done just before the reply
      message is passed over netlink or TIPC to a remote node and
      the message indicating a truncated buffer is changed to a less
      dramatic one (less CAPS), placed at the end of the message.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      dc1aed37
    • E
      tipc: simplify print buffer handling in tipc_printf · e2dbd601
      Erik Hugne 提交于
      tipc_printf was previously used both to construct debug traces
      and to append data to buffers that should be sent over netlink
      to the tipc-config application.  A global print_buffer was
      used to format the string before it was copied to the actual
      output buffer.  This could lead to concurrent access of the
      global print_buffer, which then had to be lock protected.
      This is simplified by changing tipc_printf to append data
      directly to the output buffer using vscnprintf.
      
      With the new implementation of tipc_printf, there is no longer
      any risk of concurrent access to the internal log buffer, so
      the lock (and the comments describing it) are no longer
      strictly necessary.  However, there are still a few functions
      that do grab this lock before resizing/dumping the log
      buffer.  We leave the lock, and these functions untouched since
      they will be removed with a subsequent commit that drops the
      deprecated log buffer handling code
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      e2dbd601
  12. 01 5月, 2012 1 次提交
    • P
      tipc: compress out gratuitous extra carriage returns · 617d3c7a
      Paul Gortmaker 提交于
      Some of the comment blocks are floating in limbo between two
      functions, or between blocks of code.  Delete the extra line
      feeds between any comment and its associated following block
      of code, to be consistent with the majority of the rest of
      the kernel.  Also delete trailing newlines at EOF and fix
      a couple trivial typos in existing comments.
      
      This is a 100% cosmetic change with no runtime impact.  We get
      rid of over 500 lines of non-code, and being blank line deletes,
      they won't even show up as noise in git blame.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      617d3c7a
  13. 25 2月, 2012 1 次提交
  14. 02 1月, 2011 3 次提交
  15. 17 10月, 2010 1 次提交
  16. 24 9月, 2010 1 次提交
  17. 19 3月, 2009 1 次提交
  18. 05 5月, 2008 6 次提交