提交 · 35c55c9877f8de0ab129fa1a309271d0ecc868b9 · openanolis / cloud-kernel

16 6月, 2016 1 次提交

tipc: add neighbor monitoring framework · 35c55c98

由 Jon Paul Maloy 提交于 6月 13, 2016

TIPC based clusters are by default set up with full-mesh link
connectivity between all nodes. Those links are expected to provide
a short failure detection time, by default set to 1500 ms. Because
of this, the background load for neighbor monitoring in an N-node
cluster increases with a factor N on each node, while the overall
monitoring traffic through the network infrastructure increases at
a ~(N * (N - 1)) rate. Experience has shown that such clusters don't
scale well beyond ~100 nodes unless we significantly increase failure
discovery tolerance.

This commit introduces a framework and an algorithm that drastically
reduces this background load, while basically maintaining the original
failure detection times across the whole cluster. Using this algorithm,
background load will now grow at a rate of ~(2 * sqrt(N)) per node, and
at ~(2 * N * sqrt(N)) in traffic overhead. As an example, each node will
now have to actively monitor 38 neighbors in a 400-node cluster, instead
of as before 399.

This "Overlapping Ring Supervision Algorithm" is completely distributed
and employs no centralized or coordinated state. It goes as follows:

- Each node makes up a linearly ascending, circular list of all its N
  known neighbors, based on their TIPC node identity. This algorithm
  must be the same on all nodes.

- The node then selects the next M = sqrt(N) - 1 nodes downstream from
  itself in the list, and chooses to actively monitor those. This is
  called its "local monitoring domain".

- It creates a domain record describing the monitoring domain, and
  piggy-backs this in the data area of all neighbor monitoring messages
  (LINK_PROTOCOL/STATE) leaving that node. This means that all nodes in
  the cluster eventually (default within 400 ms) will learn about
  its monitoring domain.

- Whenever a node discovers a change in its local domain, e.g., a node
  has been added or has gone down, it creates and sends out a new
  version of its node record to inform all neighbors about the change.

- A node receiving a domain record from anybody outside its local domain
  matches this against its own list (which may not look the same), and
  chooses to not actively monitor those members of the received domain
  record that are also present in its own list. Instead, it relies on
  indications from the direct monitoring nodes if an indirectly
  monitored node has gone up or down. If a node is indicated lost, the
  receiving node temporarily activates its own direct monitoring towards
  that node in order to confirm, or not, that it is actually gone.

- Since each node is actively monitoring sqrt(N) downstream neighbors,
  each node is also actively monitored by the same number of upstream
  neighbors. This means that all non-direct monitoring nodes normally
  will receive sqrt(N) indications that a node is gone.

- A major drawback with ring monitoring is how it handles failures that
  cause massive network partitionings. If both a lost node and all its
  direct monitoring neighbors are inside the lost partition, the nodes in
  the remaining partition will never receive indications about the loss.
  To overcome this, each node also chooses to actively monitor some
  nodes outside its local domain. Those nodes are called remote domain
  "heads", and are selected in such a way that no node in the cluster
  will be more than two direct monitoring hops away. Because of this,
  each node, apart from monitoring the member of its local domain, will
  also typically monitor sqrt(N) remote head nodes.

- As an optimization, local list status, domain status and domain
  records are marked with a generation number. This saves senders from
  unnecessarily conveying  unaltered domain records, and receivers from
  performing unneeded re-adaptations of their node monitoring list, such
  as re-assigning domain heads.

- As a measure of caution we have added the possibility to disable the
  new algorithm through configuration. We do this by keeping a threshold
  value for the cluster size; a cluster that grows beyond this value
  will switch from full-mesh to ring monitoring, and vice versa when
  it shrinks below the value. This means that if the threshold is set to
  a value larger than any anticipated cluster size (default size is 32)
  the new algorithm is effectively disabled. A patch set for altering the
  threshold value and for listing the table contents will follow shortly.

- This change is fully backwards compatible.
Acked-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35c55c98

06 3月, 2015 1 次提交

tipc: add ip/udp media type · d0f91938

由 Erik Hugne 提交于 3月 05, 2015

The ip/udp bearer can be configured in a point-to-point
mode by specifying both local and remote ip/hostname,
or it can be enabled in multicast mode, where links are
established to all tipc nodes that have joined the same
multicast group. The multicast IP address is generated
based on the TIPC network ID, but can be overridden by
using another multicast address as remote ip.
Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d0f91938

10 2月, 2015 3 次提交

tipc: remove tipc_snprintf · 941787b8

由 Richard Alpe 提交于 2月 09, 2015

tipc_snprintf() was heavily utilized by the old netlink API which no
longer exists (now netlink compat).

In this patch we swap tipc_snprintf() to the identical scnprintf() in
the only remaining occurrence.
Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

941787b8

tipc: nl compat add noop and remove legacy nl framework · 22ae7cff

由 Richard Alpe 提交于 2月 09, 2015

Add TIPC_CMD_NOOP to compat layer and remove the old framework.

All legacy nl commands are now converted to the compat layer in
netlink_compat.c.
Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22ae7cff

tipc: move and rename the legacy nl api to "nl compat" · bfb3e5dd

由 Richard Alpe 提交于 2月 09, 2015

The new netlink API is no longer "v2" but rather the standard API and
the legacy API is now "nl compat". We split them into separate
start/stop and put them in different files in order to further
distinguish them.
Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bfb3e5dd

27 11月, 2014 1 次提交

tipc: remove node subscription infrastructure · a8f48af5

由 Ying Xue 提交于 11月 26, 2014

The node subscribe infrastructure represents a virtual base class, so
its users, such as struct tipc_port and struct publication, can derive
its implemented functionalities. However, after the removal of struct
tipc_port, struct publication is left as its only single user now. So
defining an abstract infrastructure for one user becomes no longer
reasonable. If corresponding new functions associated with the
infrastructure are moved to name_table.c file, the node subscription
infrastructure can be removed as well.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8f48af5

24 8月, 2014 2 次提交

tipc: remove files ref.h and ref.c · 808d90f9

由 Jon Paul Maloy 提交于 8月 22, 2014

The reference table is now 'socket aware' instead of being generic,
and has in reality become a socket internal table. In order to be
able to minimize the API exposed by the socket layer towards the rest
of the stack, we now move the reference table definitions and functions
into the file socket.c, and rename the functions accordingly.

There are no functional changes in this commit.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

808d90f9

tipc: remove source file port.c · 0fc87aae

由 Jon Paul Maloy 提交于 8月 22, 2014

In this commit, we move the remaining functions in port.c to
socket.c, and give them new names that correspond to their new
location. We then remove the file port.c.

There are only cosmetic changes to the moved functions.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0fc87aae

06 5月, 2014 1 次提交

tipc: purge signal handler infrastructure · 52ff8720

由 Ying Xue 提交于 5月 05, 2014

In the previous commits of this series, we removed all asynchronous
actions which were based on the tasklet handler - "tipc_k_signal()".

So the moment has now come when we can completely remove the tasklet
handler infrastructure. That is done with this commit.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52ff8720

18 6月, 2013 2 次提交

tipc: introduce new TIPC server infrastructure · c5fa7b3c

由 Ying Xue 提交于 6月 17, 2013

TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.

As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.

To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.

The new service also solves another problem:

As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.

Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.

As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.

As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5fa7b3c

tipc: change socket buffer overflow control to respect sk_rcvbuf · cc79dd1b

由 Ying Xue 提交于 6月 17, 2013

As per feedback from the netdev community, we change the buffer
overflow protection algorithm in receiving sockets so that it
always respects the nominal upper limit set in sk_rcvbuf.

Instead of scaling up from a small sk_rcvbuf value, which leads to
violation of the configured sk_rcvbuf limit, we now calculate the
weighted per-message limit by scaling down from a much bigger value,
still in the same field, according to the importance priority of the
received message.

To allow for administrative tunability of the socket receive buffer
size, we create a tipc_rmem sysctl variable to allow the user to
configure an even bigger value via sysctl command.  It is a size of
three (min/default/max) to be consistent with things like tcp_rmem.

By default, the value initialized in tipc_rmem[1] is equal to the
receive socket size needed by a TIPC_CRITICAL_IMPORTANCE message.
This value is also set as the default value of sk_rcvbuf.
Originally-by: NJon Maloy <jon.maloy@ericsson.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
[Ying: added sysctl variation to Jon's original patch]
Signed-off-by: NYing Xue <ying.xue@windriver.com>
[PG: don't compile sysctl.c if not config'd; add Documentation]
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc79dd1b

18 4月, 2013 1 次提交

tipc: add InfiniBand media type · a29a194a

由 Patrick McHardy 提交于 4月 17, 2013

Add InfiniBand media type based on the ethernet media type.

The only real difference is that in case of InfiniBand, we need the entire
20 bytes of space reserved for media addresses, so the TIPC media type ID is
not explicitly stored in the packet payload.

Sample output of tipc-config:

# tipc-config -v -addr -netid -nt=all -p -m -b -n -ls

node address: <10.1.4>
current network id: 4711
Type       Lower      Upper      Port Identity              Publication Scope
0          167776257  167776257  <10.1.1:1855512577>        1855512578  cluster
           167776260  167776260  <10.1.4:1216454657>        1216454658  zone
1          1          1          <10.1.4:1216479235>        1216479236  node
Ports:
1216479235: bound to {1,1}
1216454657: bound to {0,167776260}
Media:
eth
ib
Bearers:
ib:ib0
Nodes known:
<10.1.1>: up
Link <broadcast-link>
  Window:20 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:0 defs:0 dups:0
  TX naks:0 acks:0 dups:0
  Congestion bearer:0 link:0  Send queue max:0 avg:0

Link <10.1.4:ib0-10.1.1:ib0>
  ACTIVE  MTU:2044  Priority:10  Tolerance:1500 ms  Window:50 packets
  RX packets:80 fragments:0/0 bundles:0/0
  TX packets:40 fragments:0/0 bundles:0/0
  TX profile sample:22 packets  average:54 octets
  0-64:100% -256:0% -1024:0% -4096:0% -16384:0% -32768:0% -66000:0%
  RX states:410 probes:213 naks:0 defs:0 dups:0
  TX states:410 probes:197 naks:0 acks:0 dups:0
  Congestion bearer:0 link:0  Send queue max:1 avg:0
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a29a194a

01 5月, 2012 1 次提交

tipc: compress out gratuitous extra carriage returns · 617d3c7a

由 Paul Gortmaker 提交于 4月 30, 2012

Some of the comment blocks are floating in limbo between two
functions, or between blocks of code.  Delete the extra line
feeds between any comment and its associated following block
of code, to be consistent with the majority of the rest of
the kernel.  Also delete trailing newlines at EOF and fix
a couple trivial typos in existing comments.

This is a 100% cosmetic change with no runtime impact.  We get
rid of over 500 lines of non-code, and being blank line deletes,
they won't even show up as noise in git blame.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

617d3c7a

02 1月, 2011 4 次提交

tipc: rename dbg.[ch] to log.[ch] · f5e75269

由 Allan Stephens 提交于 12月 31, 2010

As the first step in removing obsolete debugging code from TIPC the
files that implement TIPC's non-debug-related log buffer subsystem
are renamed to better reflect their true nature.
Signed-off-by: NAllan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5e75269

tipc: Remove user registry subsystem · b0c1e928

由 Allan Stephens 提交于 12月 31, 2010

Eliminates routines, data structures, and files that make up TIPC's
user registry. The user registry is no longer needed since the native
API routines that utilized it no longer exist and there are no longer
any internal TIPC services that use it.
Signed-off-by: NAllan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0c1e928

tipc: Remove prototype code for supporting multiple clusters · 8f92df6a

由 Allan Stephens 提交于 12月 31, 2010

Eliminates routines, data structures, and files that were intended
to allow TIPC to support a network containing multiple clusters.
Currently, TIPC supports only networks consisting of a single cluster
within a single zone, so this code is unnecessary.
Signed-off-by: NAllan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f92df6a

tipc: Remove prototype code for supporting multiple zones · 51f98a8d

由 Allan Stephens 提交于 12月 31, 2010

Eliminates routines, data structures, and files that were intended
to allows TIPC to support a network containing multiple zones.
Currently, TIPC supports only networks consisting of a single cluster
within a single zone, so this code is unnecessary.
Signed-off-by: NAllan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51f98a8d

13 1月, 2006 1 次提交

[TIPC] Initial merge · b97bf3fd

由 Per Liden 提交于 1月 02, 2006

TIPC (Transparent Inter Process Communication) is a protocol designed for
intra cluster communication. For more information see
http://tipc.sourceforge.netSigned-off-by: NPer Liden <per.liden@nospam.ericsson.com>

b97bf3fd

openanolis / cloud-kernel 11 个月 前同步成功

openanolis / cloud-kernel
11 个月前同步成功