1. 28 4月, 2019 1 次提交
    • M
      netlink: make nla_nest_start() add NLA_F_NESTED flag · ae0be8de
      Michal Kubecek 提交于
      Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
      netlink based interfaces (including recently added ones) are still not
      setting it in kernel generated messages. Without the flag, message parsers
      not aware of attribute semantics (e.g. wireshark dissector or libmnl's
      mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
      the structure of their contents.
      
      Unfortunately we cannot just add the flag everywhere as there may be
      userspace applications which check nlattr::nla_type directly rather than
      through a helper masking out the flags. Therefore the patch renames
      nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
      as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
      are rewritten to use nla_nest_start().
      
      Except for changes in include/net/netlink.h, the patch was generated using
      this semantic patch:
      
      @@ expression E1, E2; @@
      -nla_nest_start(E1, E2)
      +nla_nest_start_noflag(E1, E2)
      
      @@ expression E1, E2; @@
      -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
      +nla_nest_start(E1, E2)
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae0be8de
  2. 17 4月, 2019 1 次提交
    • T
      tipc: fix link established but not in session · f7a93780
      Tuong Lien 提交于
      According to the link FSM, when a link endpoint got RESET_MSG (- a
      traditional one without the stopping bit) from its peer, it moves to
      PEER_RESET state and raises a LINK_DOWN event which then resets the
      link itself. Its state will become ESTABLISHING after the reset event
      and the link will be re-established soon after this endpoint starts to
      send ACTIVATE_MSG to the peer.
      
      There is no problem with this mechanism, however the link resetting has
      cleared the link 'in_session' flag (along with the other important link
      data such as: the link 'mtu') that was correctly set up at the 1st step
      (i.e. when this endpoint received the peer RESET_MSG). As a result, the
      link will become ESTABLISHED, but the 'in_session' flag is not set, and
      all STATE_MSG from its peer will be dropped at the link_validate_msg().
      It means the link not synced and will sooner or later face a failure.
      
      Since the link reset action is obviously needed for a new link session
      (this is also true in the other situations), the problem here is that
      the link is re-established a bit too early when the link endpoints are
      not really in-sync yet. The commit forces a resync as already done in
      the previous commit 91986ee1 ("tipc: fix link session and
      re-establish issues") by simply varying the link 'peer_session' value
      at the link_reset().
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7a93780
  3. 05 4月, 2019 3 次提交
    • T
      tipc: adapt link failover for new Gap-ACK algorithm · 58ee86b8
      Tuong Lien 提交于
      In commit 0ae955e2656d ("tipc: improve TIPC throughput by Gap ACK
      blocks"), we enhance the link transmq by releasing as many packets as
      possible with the multi-ACKs from peer node. This also means the queue
      is now non-linear and the peer link deferdq becomes vital.
      
      Whereas, in the case of link failover, all messages in the link transmq
      need to be transmitted as tunnel messages in such a way that message
      sequentiality and cardinality per sender is preserved. This requires us
      to maintain the link deferdq somehow, so that when the tunnel messages
      arrive, the inner user messages along with the ones in the deferdq will
      be delivered to upper layer correctly.
      
      The commit accomplishes this by defining a new queue in the TIPC link
      structure to hold the old link deferdq when link failover happens and
      process it upon receipt of tunnel messages.
      
      Also, in the case of link syncing, the link deferdq will not be purged
      to avoid unnecessary retransmissions that in the worst case will fail
      because the packets might have been freed on the sending side.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58ee86b8
    • T
      tipc: reduce duplicate packets for unicast traffic · 382f598f
      Tuong Lien 提交于
      For unicast transmission, the current NACK sending althorithm is over-
      active that forces the sending side to retransmit a packet that is not
      really lost but just arrived at the receiving side with some delay, or
      even retransmit same packets that have already been retransmitted
      before. As a result, many duplicates are observed also under normal
      condition, ie. without packet loss.
      
      One example case is: node1 transmits 1 2 3 4 10 5 6 7 8 9, when node2
      receives packet #10, it puts into the deferdq. When the packet #5 comes
      it sends NACK with gap [6 - 9]. However, shortly after that, when
      packet #6 arrives, it pulls out packet #10 from the deferfq, but it is
      still out of order, so it makes another NACK with gap [7 - 9] and so on
      ... Finally, node1 has to retransmit the packets 5 6 7 8 9 a number of
      times, but in fact all the packets are not lost at all, so duplicates!
      
      This commit reduces duplicates by changing the condition to send NACK,
      also restricting the retransmissions on individual packets via a timer
      of about 1ms. However, it also needs to say that too tricky condition
      for NACKs or too long timeout value for retransmissions will result in
      performance reducing! The criterias in this commit are found to be
      effective for both the requirements to reduce duplicates but not affect
      performance.
      
      The tipc_link_rcv() is also improved to only dequeue skb from the link
      deferdq if it is expected (ie. its seqno <= rcv_nxt).
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      382f598f
    • T
      tipc: improve TIPC throughput by Gap ACK blocks · 9195948f
      Tuong Lien 提交于
      During unicast link transmission, it's observed very often that because
      of one or a few lost/dis-ordered packets, the sending side will fastly
      reach the send window limit and must wait for the packets to be arrived
      at the receiving side or in the worst case, a retransmission must be
      done first. The sending side cannot release a lot of subsequent packets
      in its transmq even though all of them might have already been received
      by the receiving side.
      That is, one or two packets dis-ordered/lost and dozens of packets have
      to wait, this obviously reduces the overall throughput!
      
      This commit introduces an algorithm to overcome this by using "Gap ACK
      blocks". Basically, a Gap ACK block will consist of <ack, gap> numbers
      that describes the link deferdq where packets have been got by the
      receiving side but with gaps, for example:
      
            link deferdq: [1 2 3 4      10 11      13 14 15       20]
      --> Gap ACK blocks:       <4, 5>,   <11, 1>,      <15, 4>, <20, 0>
      
      The Gap ACK blocks will be sent to the sending side along with the
      traditional ACK or NACK message. Immediately when receiving the message
      the sending side will now not only release from its transmq the packets
      ack-ed by the ACK but also by the Gap ACK blocks! So, more packets can
      be enqueued and transmitted.
      In addition, the sending side can now do "multi-retransmissions"
      according to the Gaps reported in the Gap ACK blocks.
      
      The new algorithm as verified helps greatly improve the TIPC throughput
      especially under packet loss condition.
      
      So far, a maximum of 32 blocks is quite enough without any "Too few Gap
      ACK blocks" reports with a 5.0% packet loss rate, however this number
      can be increased in the furture if needed.
      
      Also, the patch is backward compatible.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9195948f
  4. 20 3月, 2019 1 次提交
    • H
      tipc: support broadcast/replicast configurable for bc-link · 02ec6caf
      Hoang Le 提交于
      Currently, a multicast stream uses either broadcast or replicast as
      transmission method, based on the ratio between number of actual
      destinations nodes and cluster size.
      
      However, when an L2 interface (e.g., VXLAN) provides pseudo
      broadcast support, this becomes very inefficient, as it blindly
      replicates multicast packets to all cluster/subnet nodes,
      irrespective of whether they host actual target sockets or not.
      
      The TIPC multicast algorithm is able to distinguish real destination
      nodes from other nodes, and hence provides a smarter and more
      efficient method for transferring multicast messages than
      pseudo broadcast can do.
      
      Because of this, we now make it possible for users to force
      the broadcast link to permanently switch to using replicast,
      irrespective of which capabilities the bearer provides,
      or pretend to provide.
      Conversely, we also make it possible to force the broadcast link
      to always use true broadcast. While maybe less useful in
      deployed systems, this may at least be useful for testing the
      broadcast algorithm in small clusters.
      
      We retain the current AUTOSELECT ability, i.e., to let the broadcast link
      automatically select which algorithm to use, and to switch back and forth
      between broadcast and replicast as the ratio between destination
      node number and cluster size changes. This remains the default method.
      
      Furthermore, we make it possible to configure the threshold ratio for
      such switches. The default ratio is now set to 10%, down from 25% in the
      earlier implementation.
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02ec6caf
  5. 12 2月, 2019 2 次提交
    • T
      tipc: fix link session and re-establish issues · 91986ee1
      Tuong Lien 提交于
      When a link endpoint is re-created (e.g. after a node reboot or
      interface reset), the link session number is varied by random, the peer
      endpoint will be synced with this new session number before the link is
      re-established.
      
      However, there is a shortcoming in this mechanism that can lead to the
      link never re-established or faced with a failure then. It happens when
      the peer endpoint is ready in ESTABLISHING state, the 'peer_session' as
      well as the 'in_session' flag have been set, but suddenly this link
      endpoint leaves. When it comes back with a random session number, there
      are two situations possible:
      
      1/ If the random session number is larger than (or equal to) the
      previous one, the peer endpoint will be updated with this new session
      upon receipt of a RESET_MSG from this endpoint, and the link can be re-
      established as normal. Otherwise, all the RESET_MSGs from this endpoint
      will be rejected by the peer. In turn, when this link endpoint receives
      one ACTIVATE_MSG from the peer, it will move to ESTABLISHED and start
      to send STATE_MSGs, but again these messages will be dropped by the
      peer due to wrong session.
      The peer link endpoint can still become ESTABLISHED after receiving a
      traffic message from this endpoint (e.g. a BCAST_PROTOCOL or
      NAME_DISTRIBUTOR), but since all the STATE_MSGs are invalid, the link
      will be forced down sooner or later!
      
      Even in case the random session number is larger than the previous one,
      it can be that the ACTIVATE_MSG from the peer arrives first, and this
      link endpoint moves quickly to ESTABLISHED without sending out any
      RESET_MSG yet. Consequently, the peer link will not be updated with the
      new session number, and the same link failure scenario as above will
      happen.
      
      2/ Another situation can be that, the peer link endpoint was reset due
      to any reasons in the meantime, its link state was set to RESET from
      ESTABLISHING but still in session, i.e. the 'in_session' flag is not
      reset...
      Now, if the random session number from this endpoint is less than the
      previous one, all the RESET_MSGs from this endpoint will be rejected by
      the peer. In the other direction, when this link endpoint receives a
      RESET_MSG from the peer, it moves to ESTABLISHING and starts to send
      ACTIVATE_MSGs, but all these messages will be rejected by the peer too.
      As a result, the link cannot be re-established but gets stuck with this
      link endpoint in state ESTABLISHING and the peer in RESET!
      
      Solution:
      
      ===========
      
      This link endpoint should not go directly to ESTABLISHED when getting
      ACTIVATE_MSG from the peer which may belong to the old session if the
      link was re-created. To ensure the session to be correct before the
      link is re-established, the peer endpoint in ESTABLISHING state will
      send back the last session number in ACTIVATE_MSG for a verification at
      this endpoint. Then, if needed, a new and more appropriate session
      number will be regenerated to force a re-synch first.
      
      In addition, when a link in ESTABLISHING state is reset, its state will
      move to RESET according to the link FSM, along with resetting the
      'in_session' flag (and the other data) as a normal link reset, it will
      also be deleted if requested.
      
      The solution is backward compatible.
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91986ee1
    • H
      tipc: fix skb may be leaky in tipc_link_input · 7384b538
      Hoang Le 提交于
      When we free skb at tipc_data_input, we return a 'false' boolean.
      Then, skb passed to subcalling tipc_link_input in tipc_link_rcv,
      
      <snip>
      1303 int tipc_link_rcv:
      ...
      1354    if (!tipc_data_input(l, skb, l->inputq))
      1355        rc |= tipc_link_input(l, skb, l->inputq);
      </snip>
      
      Fix it by simple changing to a 'true' boolean when skb is being free-ed.
      Then, tipc_link_rcv will bypassed to subcalling tipc_link_input as above
      condition.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <maloy@donjonn.com>
      Signed-off-by: NHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7384b538
  6. 24 1月, 2019 1 次提交
    • G
      tipc: mark expected switch fall-throughs · f79e3365
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      
      This patch fixes the following warnings:
      
      net/tipc/link.c:1125:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      net/tipc/socket.c:736:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      net/tipc/socket.c:2418:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
      
      Warning level 3 was used: -Wimplicit-fallthrough=3
      
      This patch is part of the ongoing efforts to enabling
      -Wimplicit-fallthrough.
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f79e3365
  7. 20 12月, 2018 3 次提交
    • H
      tipc: fix uninitialized value for broadcast retransmission · 05572271
      Hoang Le 提交于
      When sending broadcast message on high load system, there are a lot of
      unnecessary packets restranmission. That issue was caused by missing in
      initial criteria for retransmission.
      
      To prevent this happen, just initialize this criteria for retransmission
      in next 10 milliseconds.
      
      Fixes: 31c4f4cc ("tipc: improve broadcast retransmission algorithm")
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05572271
    • T
      tipc: add trace_events for tipc link · 26574db0
      Tuong Lien 提交于
      The commit adds the new trace_events for TIPC link object:
      
      trace_tipc_link_timeout()
      trace_tipc_link_fsm()
      trace_tipc_link_reset()
      trace_tipc_link_too_silent()
      trace_tipc_link_retrans()
      trace_tipc_link_bc_ack()
      trace_tipc_link_conges()
      
      And the traces for PROTOCOL messages at building and receiving:
      
      trace_tipc_proto_build()
      trace_tipc_proto_rcv()
      
      Note:
      a) The 'tipc_link_too_silent' event will only happen when the
      'silent_intv_cnt' is about to reach the 'abort_limit' value (and the
      event is enabled). The benefit for this kind of event is that we can
      get an early indication about TIPC link loss issue due to timeout, then
      can do some necessary actions for troubleshooting.
      
      For example: To trigger the 'tipc_proto_rcv' when the 'too_silent'
      event occurs:
      
      echo 'enable_event:tipc:tipc_proto_rcv' > \
            events/tipc/tipc_link_too_silent/trigger
      
      And disable it when TIPC link is reset:
      
      echo 'disable_event:tipc:tipc_proto_rcv' > \
            events/tipc/tipc_link_reset/trigger
      
      b) The 'tipc_link_retrans' or 'tipc_link_bc_ack' event is useful to
      trace TIPC retransmission issues.
      
      In addition, the commit adds the 'trace_tipc_list/link_dump()' at the
      'retransmission failure' case. Then, if the issue occurs, the link
      'transmq' along with the link data can be dumped for post-analysis.
      These dump events should be enabled by default since it will only take
      effect when the failure happens.
      
      The same approach is also applied for the faulty case that the
      validation of protocol message is failed.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26574db0
    • T
      tipc: enable tracepoints in tipc · b4b9771b
      Tuong Lien 提交于
      As for the sake of debugging/tracing, the commit enables tracepoints in
      TIPC along with some general trace_events as shown below. It also
      defines some 'tipc_*_dump()' functions that allow to dump TIPC object
      data whenever needed, that is, for general debug purposes, ie. not just
      for the trace_events.
      
      The following trace_events are now available:
      
      - trace_tipc_skb_dump(): allows to trace and dump TIPC msg & skb data,
        e.g. message type, user, droppable, skb truesize, cloned skb, etc.
      
      - trace_tipc_list_dump(): allows to trace and dump any TIPC buffers or
        queues, e.g. TIPC link transmq, socket receive queue, etc.
      
      - trace_tipc_sk_dump(): allows to trace and dump TIPC socket data, e.g.
        sk state, sk type, connection type, rmem_alloc, socket queues, etc.
      
      - trace_tipc_link_dump(): allows to trace and dump TIPC link data, e.g.
        link state, silent_intv_cnt, gap, bc_gap, link queues, etc.
      
      - trace_tipc_node_dump(): allows to trace and dump TIPC node data, e.g.
        node state, active links, capabilities, link entries, etc.
      
      How to use:
      Put the trace functions at any places where we want to dump TIPC data
      or events.
      
      Note:
      a) The dump functions will generate raw data only, that is, to offload
      the trace event's processing, it can require a tool or script to parse
      the data but this should be simple.
      
      b) The trace_tipc_*_dump() should be reserved for a failure cases only
      (e.g. the retransmission failure case) or where we do not expect to
      happen too often, then we can consider enabling these events by default
      since they will almost not take any effects under normal conditions,
      but once the rare condition or failure occurs, we get the dumped data
      fully for post-analysis.
      
      For other trace purposes, we can reuse these trace classes as template
      but different events.
      
      c) A trace_event is only effective when we enable it. To enable the
      TIPC trace_events, echo 1 to 'enable' files in the events/tipc/
      directory in the 'debugfs' file system. Normally, they are located at:
      
      /sys/kernel/debug/tracing/events/tipc/
      
      For example:
      
      To enable the tipc_link_dump event:
      
      echo 1 > /sys/kernel/debug/tracing/events/tipc/tipc_link_dump/enable
      
      To enable all the TIPC trace_events:
      
      echo 1 > /sys/kernel/debug/tracing/events/tipc/enable
      
      To collect the trace data:
      
      cat trace
      
      or
      
      cat trace_pipe > /trace.out &
      
      To disable all the TIPC trace_events:
      
      echo 0 > /sys/kernel/debug/tracing/events/tipc/enable
      
      To clear the trace buffer:
      
      echo > trace
      
      d) Like the other trace_events, the feature like 'filter' or 'trigger'
      is also usable for the tipc trace_events.
      For more details, have a look at:
      
      Documentation/trace/ftrace.txt
      
      MAINTAINERS | add two new files 'trace.h' & 'trace.c' in tipc
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4b9771b
  8. 12 11月, 2018 2 次提交
    • J
      tipc: fix link re-establish failure · 7ab412d3
      Jon Maloy 提交于
      When a link failure is detected locally, the link is reset, the flag
      link->in_session is set to false, and a RESET_MSG with the 'stopping'
      bit set is sent to the peer.
      
      The purpose of this bit is to inform the peer that this endpoint just
      is going down, and that the peer should handle the reception of this
      particular RESET message as a local failure. This forces the peer to
      accept another RESET or ACTIVATE message from this endpoint before it
      can re-establish the link. This again is necessary to ensure that
      link session numbers are properly exchanged before the link comes up
      again.
      
      If a failure is detected locally at the same time at the peer endpoint
      this will do the same, which is also a correct behavior.
      
      However, when receiving such messages, the endpoints will not
      distinguish between 'stopping' RESETs and ordinary ones when it comes
      to updating session numbers. Both endpoints will copy the received
      session number and set their 'in_session' flags to true at the
      reception, while they are still expecting another RESET from the
      peer before they can go ahead and re-establish. This is contradictory,
      since, after applying the validation check referred to below, the
      'in_session' flag will cause rejection of all such messages, and the
      link will never come up again.
      
      We now fix this by not only handling received RESET/STOPPING messages
      as a local failure, but also by omitting to set a new session number
      and the 'in_session' flag in such cases.
      
      Fixes: 7ea817f4 ("tipc: check session number before accepting link protocol messages")
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ab412d3
    • L
      tipc: improve broadcast retransmission algorithm · 31c4f4cc
      LUU Duc Canh 提交于
      Currently, the broadcast retransmission algorithm is using the
      'prev_retr' field in struct tipc_link to time stamp the latest broadcast
      retransmission occasion. This helps to restrict retransmission of
      individual broadcast packets to max once per 10 milliseconds, even
      though all other criteria for retransmission are met.
      
      We now move this time stamp to the control block of each individual
      packet, and remove other limiting criteria. This simplifies the
      retransmission algorithm, and eliminates any risk of logical errors
      in selecting which packets can be retransmitted.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NLUU Duc Canh <canh.d.luu@dektech.com.au>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31c4f4cc
  9. 16 10月, 2018 1 次提交
    • J
      tipc: initialize broadcast link stale counter correctly · 4af00f4c
      Jon Maloy 提交于
      In the commit referred to below we added link tolerance as an additional
      criteria for declaring broadcast transmission "stale" and resetting the
      unicast links to the affected node.
      
      Unfortunately, this 'improvement' introduced two bugs, which each and
      one alone cause only limited problems, but combined lead to seemingly
      stochastic unicast link resets, depending on the amount of broadcast
      traffic transmitted.
      
      The first issue, a missing initialization of the 'tolerance' field of
      the receiver broadcast link, was recently fixed by commit 047491ea
      ("tipc: set link tolerance correctly in broadcast link").
      
      Ths second issue, where we omit to reset the 'stale_cnt' field of
      the same link after a 'stale' period is over, leads to this counter
      accumulating over time, and in the absence of the 'tolerance' criteria
      leads to the above described symptoms. This commit adds the missing
      initialization.
      
      Fixes: a4dc70d4 ("tipc: extend link reset criteria for stale packet retransmission")
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4af00f4c
  10. 12 10月, 2018 1 次提交
    • Y
      tipc: eliminate possible recursive locking detected by LOCKDEP · a1f8dd34
      Ying Xue 提交于
      When booting kernel with LOCKDEP option, below warning info was found:
      
      WARNING: possible recursive locking detected
      4.19.0-rc7+ #14 Not tainted
      --------------------------------------------
      swapper/0/1 is trying to acquire lock:
      00000000dcfc0fc8 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh
      include/linux/spinlock.h:334 [inline]
      00000000dcfc0fc8 (&(&list->lock)->rlock#4){+...}, at:
      tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850
      
      but task is already holding lock:
      00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh
      include/linux/spinlock.h:334 [inline]
      00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at:
      tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&list->lock)->rlock#4);
        lock(&(&list->lock)->rlock#4);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by swapper/0/1:
       #0: 00000000f7539d34 (pernet_ops_rwsem){+.+.}, at:
      register_pernet_subsys+0x19/0x40 net/core/net_namespace.c:1051
       #1: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at:
      spin_lock_bh include/linux/spinlock.h:334 [inline]
       #1: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at:
      tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849
      
      stack backtrace:
      CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc7+ #14
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1af/0x295 lib/dump_stack.c:113
       print_deadlock_bug kernel/locking/lockdep.c:1759 [inline]
       check_deadlock kernel/locking/lockdep.c:1803 [inline]
       validate_chain kernel/locking/lockdep.c:2399 [inline]
       __lock_acquire+0xf1e/0x3c60 kernel/locking/lockdep.c:3411
       lock_acquire+0x1db/0x520 kernel/locking/lockdep.c:3900
       __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
       _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
       spin_lock_bh include/linux/spinlock.h:334 [inline]
       tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850
       tipc_link_bc_create+0xb5/0x1f0 net/tipc/link.c:526
       tipc_bcast_init+0x59b/0xab0 net/tipc/bcast.c:521
       tipc_init_net+0x472/0x610 net/tipc/core.c:82
       ops_init+0xf7/0x520 net/core/net_namespace.c:129
       __register_pernet_operations net/core/net_namespace.c:940 [inline]
       register_pernet_operations+0x453/0xac0 net/core/net_namespace.c:1011
       register_pernet_subsys+0x28/0x40 net/core/net_namespace.c:1052
       tipc_init+0x83/0x104 net/tipc/core.c:140
       do_one_initcall+0x109/0x70a init/main.c:885
       do_initcall_level init/main.c:953 [inline]
       do_initcalls init/main.c:961 [inline]
       do_basic_setup init/main.c:979 [inline]
       kernel_init_freeable+0x4bd/0x57f init/main.c:1144
       kernel_init+0x13/0x180 init/main.c:1063
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413
      
      The reason why the noise above was complained by LOCKDEP is because we
      nested to hold l->wakeupq.lock and l->inputq->lock in tipc_link_reset
      function. In fact it's unnecessary to move skb buffer from l->wakeupq
      queue to l->inputq queue while holding the two locks at the same time.
      Instead, we can move skb buffers in l->wakeupq queue to a temporary
      list first and then move the buffers of the temporary list to l->inputq
      queue, which is also safe for us.
      
      Fixes: 3f32d0be ("tipc: lock wakeup & inputq at tipc_link_reset()")
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1f8dd34
  11. 11 10月, 2018 1 次提交
    • J
      tipc: set link tolerance correctly in broadcast link · 047491ea
      Jon Maloy 提交于
      In the patch referred to below we added link tolerance as an additional
      criteria for declaring broadcast transmission "stale" and resetting the
      affected links.
      
      However, the 'tolerance' field of the broadcast link is never set, and
      remains at zero. This renders the whole commit without the intended
      improving effect, but luckily also with no negative effect.
      
      In this commit we add the missing initialization.
      
      Fixes: a4dc70d4 ("tipc: extend link reset criteria for stale packet retransmission")
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      047491ea
  12. 02 10月, 2018 1 次提交
    • L
      tipc: ignore STATE_MSG on wrong link session · d949cfed
      LUU Duc Canh 提交于
      The initial session number when a link is created is based on a random
      value, taken from struct tipc_net->random. It is then incremented for
      each link reset to avoid mixing protocol messages from different link
      sessions.
      
      However, when a bearer is reset all its links are deleted, and will
      later be re-created using the same random value as the first time.
      This means that if the link never went down between creation and
      deletion we will still sometimes have two subsequent sessions with
      the same session number. In virtual environments with potentially
      long transmission times this has turned out to be a real problem.
      
      We now fix this by randomizing the session number each time a link
      is created.
      
      With a session number size of 16 bits this gives a risk of session
      collision of 1/64k. To reduce this further, we also introduce a sanity
      check on the very first STATE message arriving at a link. If this has
      an acknowledge value differing from 0, which is logically impossible,
      we ignore the message. The final risk for session collision is hence
      reduced to 1/4G, which should be sufficient.
      Signed-off-by: NLUU Duc Canh <canh.d.luu@dektech.com.au>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d949cfed
  13. 30 9月, 2018 1 次提交
    • L
      tipc: fix failover problem · c140eb16
      LUU Duc Canh 提交于
      We see the following scenario:
      1) Link endpoint B on node 1 discovers that its peer endpoint is gone.
         Since there is a second working link, failover procedure is started.
      2) Link endpoint A on node 1 sends a FAILOVER message to peer endpoint
         A on node 2. The node item 1->2 goes to state FAILINGOVER.
      3) Linke endpoint A/2 receives the failover, and is supposed to take
         down its parallell link endpoint B/2, while producing a FAILOVER
         message to send back to A/1.
      4) However, B/2 has already been deleted, so no FAILOVER message can
         created.
      5) Node 1->2 remains in state FAILINGOVER forever, refusing to receive
         any messages that can bring B/1 up again. We are left with a non-
         redundant link between node 1 and 2.
      
      We fix this with letting endpoint A/2 build a dummy FAILOVER message
      to send to back to A/1, so that the situation can be resolved.
      Signed-off-by: NLUU Duc Canh <canh.d.luu@dektech.com.au>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c140eb16
  14. 26 9月, 2018 1 次提交
  15. 22 7月, 2018 1 次提交
    • Y
      tipc: make some functions static · e064cce1
      YueHaibing 提交于
      Fixes the following sparse warnings:
      
      net/tipc/link.c:376:5: warning: symbol 'link_bc_rcv_gap' was not declared. Should it be static?
      net/tipc/link.c:823:6: warning: symbol 'link_prepare_wakeup' was not declared. Should it be static?
      net/tipc/link.c:959:6: warning: symbol 'tipc_link_advance_backlog' was not declared. Should it be static?
      net/tipc/link.c:1009:5: warning: symbol 'tipc_link_retrans' was not declared. Should it be static?
      net/tipc/monitor.c:687:5: warning: symbol '__tipc_nl_add_monitor_peer' was not declared. Should it be static?
      net/tipc/group.c:230:20: warning: symbol 'tipc_group_find_member' was not declared. Should it be static?
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e064cce1
  16. 19 7月, 2018 1 次提交
  17. 12 7月, 2018 2 次提交
    • J
      tipc: check session number before accepting link protocol messages · 7ea817f4
      Jon Maloy 提交于
      In some virtual environments we observe a significant higher number of
      packet reordering and delays than we have been used to traditionally.
      
      This makes it necessary with stricter checks on incoming link protocol
      messages' session number, which until now only has been validated for
      RESET messages.
      
      Since the other two message types, ACTIVATE and STATE messages also
      carry this number, it is easy to extend the validation check to those
      messages.
      
      We also introduce a flag indicating if a link has a valid peer session
      number or not. This eliminates the mixing of 32- and 16-bit arithmethics
      we are currently using to achieve this.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ea817f4
    • J
      tipc: add sequence number check for link STATE messages · 9012de50
      Jon Maloy 提交于
      Some switch infrastructures produce huge amounts of packet duplicates.
      This becomes a problem if those messages are STATE/NACK protocol
      messages, causing unnecessary retransmissions of already accepted
      packets.
      
      We now introduce a unique sequence number per STATE protocol message
      so that duplicates can be identified and ignored. This will also be
      useful when tracing such cases, and to avert replay attacks when TIPC
      is encrypted.
      
      For compatibility reasons we have to introduce a new capability flag
      TIPC_LINK_PROTO_SEQNO to handle this new feature.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9012de50
  18. 07 7月, 2018 1 次提交
    • J
      tipc: extend link reset criteria for stale packet retransmission · a4dc70d4
      Jon Maloy 提交于
      Currently a link is declared stale and reset if there has been 100
      repeated attempts to retransmit the same packet. However, in certain
      infrastructures we see that packet (NACK) duplicates and delays may
      cause such retransmit attempts to occur at a high rate, so that the
      peer doesn't have a reasonable chance to acknowledge the reception
      before the 100-limit is hit. This may take much less than the
      stipulated link tolerance time, and despite that probe/probe replies
      otherwise go through as normal.
      
      We now extend the criteria for link reset to also being time based.
      I.e., we don't reset the link until the link tolerance time is passed
      AND we have made 100 retransmissions attempts.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4dc70d4
  19. 05 7月, 2018 1 次提交
  20. 01 4月, 2018 2 次提交
    • J
      tipc: avoid possible string overflow · 7494cfa6
      Jon Maloy 提交于
      gcc points out that the combined length of the fixed-length inputs to
      l->name is larger than the destination buffer size:
      
      net/tipc/link.c: In function 'tipc_link_create':
      net/tipc/link.c:465:26: error: '%s' directive writing up to 32 bytes
      into a region of size between 26 and 58 [-Werror=format-overflow=]
      sprintf(l->name, "%s:%s-%s:unknown", self_str, if_name, peer_str);
      
      net/tipc/link.c:465:2: note: 'sprintf' output 11 or more bytes
      (assuming 75) into a destination of size 60
      sprintf(l->name, "%s:%s-%s:unknown", self_str, if_name, peer_str);
      
      A detailed analysis reveals that the theoretical maximum length of
      a link name is:
      max self_str + 1 + max if_name + 1 + max peer_str + 1 + max if_name =
      16 + 1 + 15 + 1 + 16 + 1 + 15 = 65
      Since we also need space for a trailing zero we now set MAX_LINK_NAME
      to 68.
      
      Just to be on the safe side we also replace the sprintf() call with
      snprintf().
      
      Fixes: 25b0b9c4 ("tipc: handle collisions of 32-bit node address
      hash values")
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7494cfa6
    • J
      tipc: replace name table service range array with rb tree · 218527fe
      Jon Maloy 提交于
      The current design of the binding table has an unnecessary memory
      consuming and complex data structure. It aggregates the service range
      items into an array, which is expanded by a factor two every time it
      becomes too small to hold a new item. Furthermore, the arrays never
      shrink when the number of ranges diminishes.
      
      We now replace this array with an RB tree that is holding the range
      items as tree nodes, each range directly holding a list of bindings.
      
      This, along with a few name changes, improves both readability and
      volume of the code, as well as reducing memory consumption and hopefully
      improving cache hit rate.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      218527fe
  21. 24 3月, 2018 4 次提交
    • J
      tipc: handle collisions of 32-bit node address hash values · 25b0b9c4
      Jon Maloy 提交于
      When a 32-bit node address is generated from a 128-bit identifier,
      there is a risk of collisions which must be discovered and handled.
      
      We do this as follows:
      - We don't apply the generated address immediately to the node, but do
        instead initiate a 1 sec trial period to allow other cluster members
        to discover and handle such collisions.
      
      - During the trial period the node periodically sends out a new type
        of message, DSC_TRIAL_MSG, using broadcast or emulated broadcast,
        to all the other nodes in the cluster.
      
      - When a node is receiving such a message, it must check that the
        presented 32-bit identifier either is unused, or was used by the very
        same peer in a previous session. In both cases it accepts the request
        by not responding to it.
      
      - If it finds that the same node has been up before using a different
        address, it responds with a DSC_TRIAL_FAIL_MSG containing that
        address.
      
      - If it finds that the address has already been taken by some other
        node, it generates a new, unused address and returns it to the
        requester.
      
      - During the trial period the requesting node must always be prepared
        to accept a failure message, i.e., a message where a peer suggests a
        different (or equal)  address to the one tried. In those cases it
        must apply the suggested value as trial address and restart the trial
        period.
      
      This algorithm ensures that in the vast majority of cases a node will
      have the same address before and after a reboot. If a legacy user
      configures the address explicitly, there will be no trial period and
      messages, so this protocol addition is completely backwards compatible.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25b0b9c4
    • J
      tipc: add 128-bit node identifier · d50ccc2d
      Jon Maloy 提交于
      We add a 128-bit node identity, as an alternative to the currently used
      32-bit node address.
      
      For the sake of compatibility and to minimize message header changes
      we retain the existing 32-bit address field. When not set explicitly by
      the user, this field will be filled with a hash value generated from the
      much longer node identity, and be used as a shorthand value for the
      latter.
      
      We permit either the address or the identity to be set by configuration,
      but not both, so when the address value is set by a legacy user the
      corresponding 128-bit node identity is generated based on the that value.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d50ccc2d
    • J
      tipc: remove direct accesses to own_addr field in struct tipc_net · 23fd3eac
      Jon Maloy 提交于
      As a preparation to changing the addressing structure of TIPC we replace
      all direct accesses to the tipc_net::own_addr field with the function
      dedicated for this, tipc_own_addr().
      
      There are no changes to program logics in this commit.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23fd3eac
    • J
      tipc: remove restrictions on node address values · 20263641
      Jon Maloy 提交于
      Nominally, TIPC organizes network nodes into a three-level network
      hierarchy consisting of the levels 'zone', 'cluster' and 'node'. This
      hierarchy is reflected in the node address format, - it is sub-divided
      into an 8-bit zone id, and 12 bit cluster id, and a 12-bit node id.
      
      However, the 'zone' and 'cluster' levels have in reality never been
      fully implemented,and never will be. The result of this has been
      that the first 20 bits the node identity structure have been wasted,
      and the usable node identity range within a cluster has been limited
      to 12 bits. This is starting to become a problem.
      
      In the following commits, we will need to be able to connect between
      nodes which are using the whole 32-bit value space of the node address.
      We therefore remove the restrictions on which values can be assigned
      to node identity, -it is from now on only a 32-bit integer with no
      assumed internal structure.
      
      Isolation between clusters is now achieved only by setting different
      values for the 'network id' field used during neighbor discovery, in
      practice leading to the latter becoming the new cluster identity.
      
      The rules for accepting discovery requests/responses from neighboring
      nodes now become:
      
      - If the user is using legacy address format on both peers, reception
        of discovery messages is subject to the legacy lookup domain check
        in addition to the cluster id check.
      
      - Otherwise, the discovery request/response is always accepted, provided
        both peers have the same network id.
      
      This secures backwards compatibility for users who have been using zone
      or cluster identities as cluster separators, instead of the intended
      'network id'.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20263641
  22. 15 2月, 2018 1 次提交
  23. 02 12月, 2017 1 次提交
    • J
      tipc: fall back to smaller MTU if allocation of local send skb fails · 4c94cc2d
      Jon Maloy 提交于
      When sending node local messages the code is using an 'mtu' of 66060
      bytes to avoid unnecessary fragmentation. During situations of low
      memory tipc_msg_build() may sometimes fail to allocate such large
      buffers, resulting in unnecessary send failures. This can easily be
      remedied by falling back to a smaller MTU, and then reassemble the
      buffer chain as if the message were arriving from a remote node.
      
      At the same time, we change the initial MTU setting of the broadcast
      link to a lower value, so that large messages always are fragmented
      into smaller buffers even when we run in single node mode. Apart from
      obtaining the same advantage as for the 'fallback' solution above, this
      turns out to give a significant performance improvement. This can
      probably be explained with the __pskb_copy() operation performed on the
      buffer for each recipient during reception. We found the optimal value
      for this, considering the most relevant skb pool, to be 3744 bytes.
      Acked-by: NYing Xue <ying.xue@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c94cc2d
  24. 11 11月, 2017 1 次提交
    • J
      tipc: improve link resiliency when rps is activated · 8d6e79d3
      Jon Maloy 提交于
      Currently, the TIPC RPS dissector is based only on the incoming packets'
      source node address, hence steering all traffic from a node to the same
      core. We have seen that this makes the links vulnerable to starvation
      and unnecessary resets when we turn down the link tolerance to very low
      values.
      
      To reduce the risk of this happening, we exempt probe and probe replies
      packets from the convergence to one core per source node. Instead, we do
      the opposite, - we try to diverge those packets across as many cores as
      possible, by randomizing the flow selector key.
      
      To make such packets identifiable to the dissector, we add a new
      'is_keepalive' bit to word 0 of the LINK_PROTOCOL header. This bit is
      set both for PROBE and PROBE_REPLY messages, and only for those.
      
      It should be noted that these packets are not part of any flow anyway,
      and only constitute a minuscule fraction of all packets sent across a
      link. Hence, there is no risk that this will affect overall performance.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d6e79d3
  25. 17 10月, 2017 1 次提交
  26. 13 10月, 2017 3 次提交
    • J
      tipc: guarantee delivery of UP event before first broadcast · 399574d4
      Jon Maloy 提交于
      The following scenario is possible:
      - A user joins a group, and immediately sends out a broadcast message
        to its members.
      - The broadcast message, following a different data path than the
        initial JOIN message sent out during the joining procedure, arrives
        to a receiver before the latter..
      - The receiver drops the message, since it is not ready to accept any
        messages until the JOIN has arrived.
      
      We avoid this by treating group protocol JOIN messages like unicast
      messages.
      - We let them pass through the recipient's multicast input queue, just
        like ordinary unicasts.
      - We force the first following broadacst to be sent as replicated
        unicast and being acknowledged by the recipient before accepting
        any more broadcast transmissions.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      399574d4
    • J
      tipc: guarantee that group broadcast doesn't bypass group unicast · 2f487712
      Jon Maloy 提交于
      We need a mechanism guaranteeing that group unicasts sent out from a
      socket are not bypassed by later sent broadcasts from the same socket.
      We do this as follows:
      
      - Each time a unicast is sent, we set a the broadcast method for the
        socket to "replicast" and "mandatory". This forces the first
        subsequent broadcast message to follow the same network and data path
        as the preceding unicast to a destination, hence preventing it from
        overtaking the latter.
      
      - In order to make the 'same data path' statement above true, we let
        group unicasts pass through the multicast link input queue, instead
        of as previously through the unicast link input queue.
      
      - In the first broadcast following a unicast, we set a new header flag,
        requiring all recipients to immediately acknowledge its reception.
      
      - During the period before all the expected acknowledges are received,
        the socket refuses to accept any more broadcast attempts, i.e., by
        blocking or returning EAGAIN. This period should typically not be
        longer than a few microseconds.
      
      - When all acknowledges have been received, the sending socket will
        open up for subsequent broadcasts, this time giving the link layer
        freedom to itself select the best transmission method.
      
      - The forced and/or abrupt transmission method changes described above
        may lead to broadcasts arriving out of order to the recipients. We
        remedy this by introducing code that checks and if necessary
        re-orders such messages at the receiving end.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f487712
    • J
      tipc: introduce communication groups · 75da2163
      Jon Maloy 提交于
      As a preparation for introducing flow control for multicast and datagram
      messaging we need a more strictly defined framework than we have now. A
      socket must be able keep track of exactly how many and which other
      sockets it is allowed to communicate with at any moment, and keep the
      necessary state for those.
      
      We therefore introduce a new concept we have named Communication Group.
      Sockets can join a group via a new setsockopt() call TIPC_GROUP_JOIN.
      The call takes four parameters: 'type' serves as group identifier,
      'instance' serves as an logical member identifier, and 'scope' indicates
      the visibility of the group (node/cluster/zone). Finally, 'flags' makes
      it possible to set certain properties for the member. For now, there is
      only one flag, indicating if the creator of the socket wants to receive
      a copy of broadcast or multicast messages it is sending via the socket,
      and if wants to be eligible as destination for its own anycasts.
      
      A group is closed, i.e., sockets which have not joined a group will
      not be able to send messages to or receive messages from members of
      the group, and vice versa.
      
      Any member of a group can send multicast ('group broadcast') messages
      to all group members, optionally including itself, using the primitive
      send(). The messages are received via the recvmsg() primitive. A socket
      can only be member of one group at a time.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      75da2163
  27. 22 8月, 2017 1 次提交
    • J
      tipc: don't reset stale broadcast send link · 40501f90
      Jon Paul Maloy 提交于
      When the broadcast send link after 100 attempts has failed to
      transfer a packet to all peers, we consider it stale, and reset
      it. Thereafter it needs to re-synchronize with the peers, something
      currently done by just resetting and re-establishing all links to
      all peers. This has turned out to be overkill, with potentially
      unwanted consequences for the remaining cluster.
      
      A closer analysis reveals that this can be done much simpler. When
      this kind of failure happens, for reasons that may lie outside the
      TIPC protocol, it is typically only one peer which is failing to
      receive and acknowledge packets. It is hence sufficient to identify
      and reset the links only to that peer to resolve the situation, without
      having to reset the broadcast link at all. This solution entails a much
      lower risk of negative consequences for the own node as well as for
      the overall cluster.
      
      We implement this change in this commit.
      Reviewed-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40501f90