提交 · 10724cc7bb7832b482df049c20fd824d928c5eaa · openanolis / cloud-kernel

04 5月, 2016 1 次提交

tipc: redesign connection-level flow control · 10724cc7

由 Jon Paul Maloy 提交于 5月 02, 2016

There are two flow control mechanisms in TIPC; one at link level that
handles network congestion, burst control, and retransmission, and one
at connection level which' only remaining task is to prevent overflow
in the receiving socket buffer. In TIPC, the latter task has to be
solved end-to-end because messages can not be thrown away once they
have been accepted and delivered upwards from the link layer, i.e, we
can never permit the receive buffer to overflow.

Currently, this algorithm is message based. A counter in the receiving
socket keeps track of number of consumed messages, and sends a dedicated
acknowledge message back to the sender for each 256 consumed message.
A counter at the sending end keeps track of the sent, not yet
acknowledged messages, and blocks the sender if this number ever reaches
512 unacknowledged messages. When the missing acknowledge arrives, the
socket is then woken up for renewed transmission. This works well for
keeping the message flow running, as it almost never happens that a
sender socket is blocked this way.

A problem with the current mechanism is that it potentially is very
memory consuming. Since we don't distinguish between small and large
messages, we have to dimension the socket receive buffer according
to a worst-case of both. I.e., the window size must be chosen large
enough to sustain a reasonable throughput even for the smallest
messages, while we must still consider a scenario where all messages
are of maximum size. Hence, the current fix window size of 512 messages
and a maximum message size of 66k results in a receive buffer of 66 MB
when truesize(66k) = 131k is taken into account. It is possible to do
much better.

This commit introduces an algorithm where we instead use 1024-byte
blocks as base unit. This unit, always rounded upwards from the
actual message size, is used when we advertise windows as well as when
we count and acknowledge transmitted data. The advertised window is
based on the configured receive buffer size in such a way that even
the worst-case truesize/msgsize ratio always is covered. Since the
smallest possible message size (from a flow control viewpoint) now is
1024 bytes, we can safely assume this ratio to be less than four, which
is the value we are now using.

This way, we have been able to reduce the default receive buffer size
from 66 MB to 2 MB with maintained performance.

In order to keep this solution backwards compatible, we introduce a
new capability bit in the discovery protocol, and use this throughout
the message sending/reception path to always select the right unit.
Acked-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10724cc7

27 7月, 2015 1 次提交

tipc: clean up socket layer message reception · cda3696d

由 Jon Paul Maloy 提交于 7月 22, 2015

When a message is received in a socket, one of the call chains
tipc_sk_rcv()->tipc_sk_enqueue()->filter_rcv()(->tipc_sk_proto_rcv())
or
tipc_sk_backlog_rcv()->filter_rcv()(->tipc_sk_proto_rcv())
are followed. At each of these levels we may encounter situations
where the message may need to be rejected, or a new message
produced for transfer back to the sender. Despite recent
improvements, the current code for doing this is perceived
as awkward and hard to follow.

Leveraging the two previous commits in this series, we now
introduce a more uniform handling of such situations. We
let each of the functions in the chain itself produce/reverse
the message to be returned to the sender, but also perform the
actual forwarding. This simplifies the necessary logics within
each function.
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cda3696d

18 3月, 2015 1 次提交

tipc: fix netns refcnt leak · 76100a8a

由 Ying Xue 提交于 3月 18, 2015

When the TIPC module is loaded, we launch a topology server in kernel
space, which in its turn is creating TIPC sockets for communication
with topology server users. Because both the socket's creator and
provider reside in the same module, it is necessary that the TIPC
module's reference count remains zero after the server is started and
the socket created; otherwise it becomes impossible to perform "rmmod"
even on an idle module.

Currently, we achieve this by defining a separate "tipc_proto_kern"
protocol struct, that is used only for kernel space socket allocations.
This structure has the "owner" field set to NULL, which restricts the
module reference count from being be bumped when sk_alloc() for local
sockets is called. Furthermore, we have defined three kernel-specific
functions, tipc_sock_create_local(), tipc_sock_release_local() and
tipc_sock_accept_local(), to avoid the module counter being modified
when module local sockets are created or deleted. This has worked well
until we introduced name space support.

However, after name space support was introduced, we have observed that
a reference count leak occurs, because the netns counter is not
decremented in tipc_sock_delete_local().

This commit remedies this problem. But instead of just modifying
tipc_sock_delete_local(), we eliminate the whole parallel socket
handling infrastructure, and start using the regular sk_create_kern(),
kernel_accept() and sk_release_kernel() calls. Since those functions
manipulate the module counter, we must now compensate for that by
explicitly decrementing the counter after module local sockets are
created, and increment it just before calling sk_release_kernel().

Fixes: a62fbcce ("tipc: make subscriber server support net namespace")
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NJon Maloy <jon.maloy@ericson.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reported-by: NCong Wang <cwang@twopensource.com>
Tested-by: NErik Hugne <erik.hugne@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76100a8a

10 2月, 2015 1 次提交

tipc: convert legacy nl socket dump to nl compat · 487d2a3a

由 Richard Alpe 提交于 2月 09, 2015

Convert socket (port) listing to compat dumpit call. If a socket
(port) has publications a second dumpit call is issued to collect them
and format then into the legacy buffer before continuing to process
the sockets (ports).

Command converted in this patch:
TIPC_CMD_SHOW_PORTS
Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

487d2a3a

06 2月, 2015 3 次提交

tipc: eliminate race condition at multicast reception · cb1b7280

由 Jon Paul Maloy 提交于 2月 05, 2015

In a previous commit in this series we resolved a race problem during
unicast message reception.

Here, we resolve the same problem at multicast reception. We apply the
same technique: an input queue serializing the delivery of arriving
buffers. The main difference is that here we do it in two steps.
First, the broadcast link feeds arriving buffers into the tail of an
arrival queue, which head is consumed at the socket level, and where
destination lookup is performed. Second, if the lookup is successful,
the resulting buffer clones are fed into a second queue, the input
queue. This queue is consumed at reception in the socket just like
in the unicast case. Both queues are protected by the same lock, -the
one of the input queue.
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cb1b7280

tipc: simplify socket multicast reception · 3c724acd

由 Jon Paul Maloy 提交于 2月 05, 2015

The structure 'tipc_port_list' is used to collect port numbers
representing multicast destination socket on a receiving node.
The list is not based on a standard linked list, and is in reality
optimized for the uncommon case that there are more than one
multicast destinations per node. This makes the list handling
unecessarily complex, and as a consequence, even the socket
multicast reception becomes more complex.

In this commit, we replace 'tipc_port_list' with a new 'struct
tipc_plist', which is based on a standard list. We give the new
list stack (push/pop) semantics, someting that simplifies
the implementation of the function tipc_sk_mcast_rcv().
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c724acd

tipc: resolve race problem at unicast message reception · c637c103

由 Jon Paul Maloy 提交于 2月 05, 2015

TIPC handles message cardinality and sequencing at the link layer,
before passing messages upwards to the destination sockets. During the
upcall from link to socket no locks are held. It is therefore possible,
and we see it happen occasionally, that messages arriving in different
threads and delivered in sequence still bypass each other before they
reach the destination socket. This must not happen, since it violates
the sequentiality guarantee.

We solve this by adding a new input buffer queue to the link structure.
Arriving messages are added safely to the tail of that queue by the
link, while the head of the queue is consumed, also safely, by the
receiving socket. Sequentiality is secured per socket by only allowing
buffers to be dequeued inside the socket lock. Since there may be multiple
simultaneous readers of the queue, we use a 'filter' parameter to reduce
the risk that they peek the same buffer from the queue, hence also
reducing the risk of contention on the receiving socket locks.

This solves the sequentiality problem, and seems to cause no measurable
performance degradation.

A nice side effect of this change is that lock handling in the functions
tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
will enable future simplifications of those functions.
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c637c103

13 1月, 2015 4 次提交

tipc: make subscriber server support net namespace · a62fbcce

由 Ying Xue 提交于 1月 09, 2015

TIPC establishes one subscriber server which allows users to subscribe
their interesting name service status. After tipc supports namespace,
one dedicated tipc stack instance is created for each namespace, and
each instance can be deemed as one independent TIPC node. As a result,
subscriber server must be built for each namespace.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Tested-by: NTero Aho <Tero.Aho@coriant.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a62fbcce

tipc: make tipc socket support net namespace · e05b31f4

由 Ying Xue 提交于 1月 09, 2015

Now tipc socket table is statically allocated as a global variable.
Through it, we can look up one socket instance with port ID, insert
a new socket instance to the table, and delete a socket from the
table. But when tipc supports net namespace, each namespace must own
its specific socket table. So the global variable of socket table
must be redefined in tipc_net structure. As a concequence, a new
socket table will be allocated when a new namespace is created, and
a socket table will be deallocated when namespace is destroyed.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Tested-by: NTero Aho <Tero.Aho@coriant.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e05b31f4

tipc: make tipc node table aware of net namespace · f2f9800d

由 Ying Xue 提交于 1月 09, 2015

Global variables associated with node table are below:
- node table list (node_htable)
- node hash table list (tipc_node_list)
- node table lock (node_list_lock)
- node number counter (tipc_num_nodes)
- node link number counter (tipc_num_links)

To make node table support namespace, above global variables must be
moved to tipc_net structure in order to keep secret for different
namespaces. As a consequence, these variables are allocated and
initialized when namespace is created, and deallocated when namespace
is destroyed. After the change, functions associated with these
variables have to utilize a namespace pointer to access them. So
adding namespace pointer as a parameter of these functions is the
major change made in the commit.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Tested-by: NTero Aho <Tero.Aho@coriant.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2f9800d

tipc: cleanup core.c and core.h files · 859fc7c0

由 Ying Xue 提交于 1月 09, 2015

Only the works of initializing and shutting down tipc module are done
in core.h and core.c files, so all stuffs which are not closely
associated with the two tasks should be moved to appropriate places.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Tested-by: NTero Aho <Tero.Aho@coriant.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

859fc7c0

09 1月, 2015 1 次提交

tipc: convert tipc reference table to use generic rhashtable · 07f6c4bc

由 Ying Xue 提交于 1月 07, 2015

As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.

If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.

Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Acked-by: NJon Maloy <jon.maloy@ericsson.com>
Acked-by: NErik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07f6c4bc

22 11月, 2014 2 次提交

tipc: add publication dump to new netlink api · 1a1a143d

由 Richard Alpe 提交于 11月 20, 2014

Add TIPC_NL_PUBL_GET command to the new tipc netlink API.

This command supports dumping of all publications for a specific
socket.

Netlink logical layout of request message:
    -> socket
        -> reference

Netlink logical layout of response message:
    -> publication
        -> type
        -> lower
        -> upper
Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Acked-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a1a143d

tipc: add sock dump to new netlink api · 34b78a12

由 Richard Alpe 提交于 11月 20, 2014

Add TIPC_NL_SOCK_GET command to the new tipc netlink API.

This command supports dumping of all available sockets with their
associated connection or publication(s). It could be extended to reply
with a single socket if the NLM_F_DUMP isn't set.

The information about a socket includes reference, address, connection
information / publication information.

Netlink logical layout of response message:
-> socket
    -> reference
    -> address
    [
    -> connection
        -> node
        -> socket
        [
        -> connected flag
        -> type
        -> instance
        ]
    ]
    [
    -> publication flag
    ]
Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Acked-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34b78a12

24 8月, 2014 5 次提交

tipc: merge struct tipc_port into struct tipc_sock · 301bae56

由 Jon Paul Maloy 提交于 8月 22, 2014

We complete the merging of the port and socket layer by aggregating
the fields of struct tipc_port directly into struct tipc_sock, and
moving the combined structure into socket.c.

We also move all functions and macros that are not any longer
exposed to the rest of the stack into socket.c, and rename them
accordingly.

Despite the size of this commit, there are no functional changes.
We have only made such changes that are necessary due of the removal
of struct tipc_port.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

301bae56

tipc: remove files ref.h and ref.c · 808d90f9

由 Jon Paul Maloy 提交于 8月 22, 2014

The reference table is now 'socket aware' instead of being generic,
and has in reality become a socket internal table. In order to be
able to minimize the API exposed by the socket layer towards the rest
of the stack, we now move the reference table definitions and functions
into the file socket.c, and rename the functions accordingly.

There are no functional changes in this commit.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

808d90f9

tipc: remove include file port.h · 2e84c60b

由 Jon Paul Maloy 提交于 8月 22, 2014

We move the inline functions in the file port.h to socket.c, and modify
their names accordingly.

We move struct tipc_port and some macros to socket.h.

Finally, we remove the file port.h.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e84c60b

tipc: use registry when scanning sockets · 5a9ee0be

由 Jon Paul Maloy 提交于 8月 22, 2014

The functions tipc_port_get_ports() and tipc_port_reinit() scan over
all sockets/ports to access each of them. This is done by using a
dedicated linked list, 'tipc_socks' where all sockets are members. The
list is in turn protected by a spinlock, 'port_list_lock', while each
socket is locked by using port_lock at the moment of access.

In order to reduce complexity and risk of deadlock, we want to get
rid of the linked list and the accompanying spinlock.

This is what we do in this commit. Instead of the linked list, we use
the port registry to scan across the sockets. We also add usage of
bh_lock_sock() inside the scope of port_lock in both functions, as a
preparation for the complete removal of port_lock.

Finally, we move the functions from port.c to socket.c, and rename them
to tipc_sk_sock_show() and tipc_sk_reinit() repectively.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a9ee0be

tipc: use pseudo message to wake up sockets after link congestion · 50100a5e

由 Jon Paul Maloy 提交于 8月 22, 2014

The current link implementation keeps a linked list of blocked ports/
sockets that is populated when there is link congestion. The purpose
of this is to let the link know which users to wake up when the
congestion abates.

This adds unnecessary complexity to the data structure and the code,
since it forces us to involve the link each time we want to delete
a socket. It also forces us to grab the spinlock port_lock within
the scope of node_lock. We want to get rid of this direct dependence,
as well as the deadlock hazard resulting from the usage of port_lock.

In this commit, we instead let the link keep list of a "wakeup" pseudo
messages for use in such situations. Those messages are sent to the
pending sockets via the ordinary message reception path, and wake up
the socket's owner when they are received.

This enables us to get rid of the 'waiting_ports' linked lists in struct
tipc_port that manifest this direct reference. As a consequence, we can
eliminate another BH entry into the socket, and hence the need to grab
port_lock. This is a further step in our effort to remove port_lock
altogether.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

50100a5e

17 7月, 2014 1 次提交

tipc: add new functions for multicast and broadcast distribution · 078bec82

由 Jon Paul Maloy 提交于 7月 16, 2014

We add a new broadcast link transmit function in bclink.c and a new
receive function in socket.c. The purpose is to move the branching
between external and internal destination down to the link layer,
just as we have done with unicast in earlier commits. We also make
use of the new link-independent fragmentation support that was
introduced in an earlier commit series.

This gives a shorter and simpler code path, and makes it possible
to obtain copy-free buffer delivery to all node local destination
sockets.

The new transmission code is added in parallel with the existing one,
and will be used by the socket multicast send function in the next
commit in this series.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

078bec82

28 6月, 2014 2 次提交

tipc: simplify connection congestion handling · 60120526

由 Jon Paul Maloy 提交于 6月 25, 2014

As a consequence of the recently introduced serialized access
to the socket in commit 8d94168a761819d10252bab1f8de6d7b202c3baa
("tipc: same receive code path for connection protocol and data
messages") we can make a number of simplifications in the
detection and handling of connection congestion situations.

- We don't need to keep two counters, one for sent messages and one
  for acked messages. There is no longer any risk for races between
  acknowledge messages arriving in BH and data message sending
  running in user context. So we merge this into one counter,
  'sent_unacked', which is incremented at sending and subtracted
  from at acknowledge reception.

- We don't need to set the 'congested' field in tipc_port to
  true before we sent the message, and clear it when sending
  is successful. (As a matter of fact, it was never necessary;
  the field was set in link_schedule_port() before any wakeup
  could arrive anyway.)

- We keep the conditions for link congestion and connection connection
  congestion separated. There would otherwise be a risk that an arriving
  acknowledge message may wake up a user sleeping because of link
  congestion.

- We can simplify reception of acknowledge messages.

We also make some cosmetic/structural changes:

- We rename the 'congested' field to the more correct 'link_cong´.

- We rename 'conn_unacked' to 'rcv_unacked'

- We move the above mentioned fields from struct tipc_port to
  struct tipc_sock.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60120526

tipc: clean up connection protocol reception function · ac0074ee

由 Jon Paul Maloy 提交于 6月 25, 2014

We simplify the code for receiving connection probes, leveraging the
recently introduced tipc_msg_reverse() function. We also stick to
the principle of sending a possible response message directly from
the calling (tipc_sk_rcv or backlog_rcv) functions, hence making
the call chain shallower and easier to follow.

We make one small protocol change here, allowed according to
the spec. If a protocol message arrives from a remote socket that
is not the one we are connected to, we are currently generating a
connection abort message and send it to the source. This behavior
is unnecessary, and might even be a security risk, so instead we
now choose to only ignore the message. The consequnce for the sender
is that he will need longer time to discover his mistake (until the
next timeout), but this is an extreme corner case, and may happen
anyway under other circumstances, so we deem this change acceptable.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac0074ee

15 5月, 2014 2 次提交

tipc: merge port message reception into socket reception function · 9816f061

由 Jon Paul Maloy 提交于 5月 14, 2014

In order to reduce complexity and save a call level during message
reception at port/socket level, we remove the function tipc_port_rcv()
and merge its functionality into tipc_sk_rcv().
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9816f061

tipc: compensate for double accounting in socket rcv buffer · 4f4482dc

由 Jon Paul Maloy 提交于 5月 14, 2014

The function net/core/sock.c::__release_sock() runs a tight loop
to move buffers from the socket backlog queue to the receive queue.

As a security measure, sk_backlog.len of the receiving socket
is not set to zero until after the loop is finished, i.e., until
the whole backlog queue has been transferred to the receive queue.
During this transfer, the data that has already been moved is counted
both in the backlog queue and the receive queue, hence giving an
incorrect picture of the available queue space for new arriving buffers.

This leads to unnecessary rejection of buffers by sk_add_backlog(),
which in TIPC leads to unnecessarily broken connections.

In this commit, we compensate for this double accounting by adding
a counter that keeps track of it. The function socket.c::backlog_rcv()
receives buffers one by one from __release_sock(), and adds them to the
socket receive queue. If the transfer is successful, it increases a new
atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
transferred buffer. If a new buffer arrives during this transfer and
finds the socket busy (owned), we attempt to add it to the backlog.
However, when sk_add_backlog() is called, we adjust the 'limit'
parameter with the value of the new counter, so that the risk of
inadvertent rejection is eliminated.

It should be noted that this change does not invalidate the original
purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
doesn't respect the send window) keeps pumping in buffers to
sk_add_backlog(), he will eventually reach an upper limit,
(2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
to the backlog, and the connection will be broken. Ordinary, well-
behaved senders will never reach this buffer limit at all.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f4482dc

13 3月, 2014 3 次提交

tipc: align usage of variable names and macros in socket · 58ed9442

由 Jon Paul Maloy 提交于 3月 12, 2014

The practice of naming variables in TIPC is inconistent, sometimes
even within the same file.

In this commit we align variable names and declarations within
socket.c, and function and macro names within socket.h. We also
reduce the number of conversion macros to two, in order to make
usage less obsure.

These changes are purely cosmetic.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58ed9442

tipc: eliminate upcall function pointers between port and socket · 24be34b5

由 Jon Paul Maloy 提交于 3月 12, 2014

Due to the original one-to-many relation between port and user API
layers, upcalls to the API have been performed via function pointers,
installed in struct tipc_port at creation. Since this relation now
always is one-to-one, we can instead use ordinary function calls.

We remove the function pointers 'dispatcher' and ´wakeup' from
struct tipc_port, and replace them with calls to the renamed
functions tipc_sk_rcv() and tipc_sk_wakeup().

At the same time we change the name and signature of the functions
tipc_createport() and tipc_deleteport() to reflect their new role
as mere initialization/destruction functions.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24be34b5

tipc: aggregate port structure into socket structure · 8826cde6

由 Jon Paul Maloy 提交于 3月 12, 2014

After the removal of the tipc native API the relation between
a tipc_port and its API types is strictly one-to-one, i.e, the
latter can now only be a socket API. There is therefore no need
to allocate struct tipc_port and struct sock independently.

In this commit, we aggregate struct tipc_port into struct tipc_sock,
hence saving both CPU cycles and structure complexity.

There are no functional changes in this commit, except for the
elimination of the separate allocation/freeing of tipc_port.
All other changes are just adaptatons to the new data structure.

This commit also opens up for further code simplifications and
code volume reduction, something we will do in later commits.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8826cde6

18 6月, 2013 1 次提交

tipc: change socket buffer overflow control to respect sk_rcvbuf · cc79dd1b

由 Ying Xue 提交于 6月 17, 2013

As per feedback from the netdev community, we change the buffer
overflow protection algorithm in receiving sockets so that it
always respects the nominal upper limit set in sk_rcvbuf.

Instead of scaling up from a small sk_rcvbuf value, which leads to
violation of the configured sk_rcvbuf limit, we now calculate the
weighted per-message limit by scaling down from a much bigger value,
still in the same field, according to the importance priority of the
received message.

To allow for administrative tunability of the socket receive buffer
size, we create a tipc_rmem sysctl variable to allow the user to
configure an even bigger value via sysctl command.  It is a size of
three (min/default/max) to be consistent with things like tcp_rmem.

By default, the value initialized in tipc_rmem[1] is equal to the
receive socket size needed by a TIPC_CRITICAL_IMPORTANCE message.
This value is also set as the default value of sk_rcvbuf.
Originally-by: NJon Maloy <jon.maloy@ericsson.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
[Ying: added sysctl variation to Jon's original patch]
Signed-off-by: NYing Xue <ying.xue@windriver.com>
[PG: don't compile sysctl.c if not config'd; add Documentation]
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc79dd1b

14 7月, 2012 3 次提交

tipc: remove print_buf and deprecated log buffer code · 869dd466

由 Erik Hugne 提交于 6月 29, 2012

The internal log buffer handling functions can now safely be
removed since there is no code using it anymore.  Requests to
interact with the internal tipc log buffer over netlink (in
config.c) will report 'obsolete command'.

This represents the final removal of any references to a
struct print_buf, and the removal of the struct itself.
We also get rid of a TIPC specific Kconfig in the process.

Finally, log.h is removed since it is not needed anymore.
Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

869dd466

tipc: phase out most of the struct print_buf usage · dc1aed37

由 Erik Hugne 提交于 6月 29, 2012

The tipc_printf is renamed to tipc_snprintf, as the new name
describes more what the function actually does.  It is also
changed to take a buffer and length parameter and return
number of characters written to the buffer.  All callers of
this function that used to pass a print_buf are updated.

Final removal of the struct print_buf itself will be done
synchronously with the pending removal of the deprecated
logging code that also was using it.

Functions that build up a response message with a list of
ports, nametable contents etc. are changed to return the number
of characters written to the output buffer. This information
was previously hidden in a field of the print_buf struct, and
the number of chars written was fetched with a call to
tipc_printbuf_validate.  This function is removed since it
is no longer referenced nor needed.

A generic max size ULTRA_STRING_MAX_LEN is defined, named
in keeping with the existing TIPC_TLV_ULTRA_STRING, and the
various definitions in port, link and nametable code that
largely duplicated this information are removed.  This means
that amount of link statistics that can be returned is now
increased from 2k to 32k.

The buffer overflow check is now done just before the reply
message is passed over netlink or TIPC to a remote node and
the message indicating a truncated buffer is changed to a less
dramatic one (less CAPS), placed at the end of the message.
Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

dc1aed37

tipc: simplify print buffer handling in tipc_printf · e2dbd601

由 Erik Hugne 提交于 6月 29, 2012

tipc_printf was previously used both to construct debug traces
and to append data to buffers that should be sent over netlink
to the tipc-config application.  A global print_buffer was
used to format the string before it was copied to the actual
output buffer.  This could lead to concurrent access of the
global print_buffer, which then had to be lock protected.
This is simplified by changing tipc_printf to append data
directly to the output buffer using vscnprintf.

With the new implementation of tipc_printf, there is no longer
any risk of concurrent access to the internal log buffer, so
the lock (and the comments describing it) are no longer
strictly necessary.  However, there are still a few functions
that do grab this lock before resizing/dumping the log
buffer.  We leave the lock, and these functions untouched since
they will be removed with a subsequent commit that drops the
deprecated log buffer handling code
Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

e2dbd601

01 5月, 2012 1 次提交

tipc: compress out gratuitous extra carriage returns · 617d3c7a

由 Paul Gortmaker 提交于 4月 30, 2012

Some of the comment blocks are floating in limbo between two
functions, or between blocks of code.  Delete the extra line
feeds between any comment and its associated following block
of code, to be consistent with the majority of the rest of
the kernel.  Also delete trailing newlines at EOF and fix
a couple trivial typos in existing comments.

This is a 100% cosmetic change with no runtime impact.  We get
rid of over 500 lines of non-code, and being blank line deletes,
they won't even show up as noise in git blame.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

617d3c7a

25 2月, 2012 1 次提交

tipc: nuke the delimit static inline function. · 732efba4

由 Paul Gortmaker 提交于 2月 23, 2012

This "shortform" is actually longer than typing out what it is really
trying to do, and just makes reading the code more difficult, so
lets simply shoot it in the head.

In the case of log.c - the comparison is on a u32, so we can drop the
check for < 0 at the same time.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

732efba4

02 1月, 2011 3 次提交

tipc: Finish streamlining of debugging code · 6e7e309c

由 Allan Stephens 提交于 12月 31, 2010

Completes the simplification of TIPC's debugging capabilities. By default
TIPC includes no debugging code, and any debugging code added by developers
that calls the dbg() and dbg_macros() is compiled out. If debugging support
is enabled, TIPC prints out some additional data about its internal state
when certain abnormal conditions occur, and any developer-added calls to the
TIPC debug macros are compiled in.
Signed-off-by: NAllan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e7e309c

tipc: remove dump() and tipc_dump_dbg() · 7ced6890

由 Allan Stephens 提交于 12月 31, 2010

Eliminates calls to two debugging macros that are being completely obsoleted,
as well as any associated debugging routines that are no longer required.
Signed-off-by: NAllan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7ced6890

tipc: rename dbg.[ch] to log.[ch] · f5e75269

由 Allan Stephens 提交于 12月 31, 2010

As the first step in removing obsolete debugging code from TIPC the
files that implement TIPC's non-debug-related log buffer subsystem
are renamed to better reflect their true nature.
Signed-off-by: NAllan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5e75269

17 10月, 2010 1 次提交

tipc: cleanup function namespace · 31e3c3f6

由 stephen hemminger 提交于 10月 13, 2010

Do some cleanups of TIPC based on make namespacecheck
  1. Don't export unused symbols
  2. Eliminate dead code
  3. Make functions and variables local
  4. Rename buf_acquire to tipc_buf_acquire since it is used in several files

Compile tested only.
This make break out of tree kernel modules that depend on TIPC routines.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Acked-by: NJon Maloy <jon.maloy@ericsson.com>
Acked-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31e3c3f6

24 9月, 2010 1 次提交

net: return operator cleanup · a02cec21

由 Eric Dumazet 提交于 9月 22, 2010

Change "return (EXPR);" to "return EXPR;"

return is not a function, parentheses are not required.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a02cec21

19 3月, 2009 1 次提交

tipc: fix non-const printf format arguments · 4b704d59

由 Stephen Hemminger 提交于 3月 18, 2009

Fix warnings from current gcc about using non-const strings as printf
args in TIPC. Compile tested only (not a TIPC user).
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b704d59

05 5月, 2008 1 次提交

tipc: Exclude debug-only print buffer code when not debugging · 48c97139

由 Allan Stephens 提交于 5月 05, 2008

This patch modifies TIPC to only exclude debug-related print buffer
routines when debugging capabilities are not required.  It also
fixes up some related #defines that exceed 80 characters.
Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

48c97139

openanolis / cloud-kernel 大约 2 年 前同步成功

openanolis / cloud-kernel
大约 2 年前同步成功