提交 · d78f802f18ed100c53f331cd59791cd82ccb9438 · openeuler / Kernel

08 2月, 2015 3 次提交

rds: Make rds_message_copy_from_user() return 0 on success. · d0a47d32

由 Sowmini Varadhan 提交于 2月 05, 2015

Commit 083735f4 ("rds: switch rds_message_copy_from_user() to iov_iter")
breaks rds_message_copy_from_user() semantics on success, and causes it
to return nbytes copied, when it should return 0. This commit fixes that bug.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d0a47d32

net: rds: Remove repeated function names from debug output · 11ac1199

由 Rasmus Villemoes 提交于 2月 05, 2015

The macro rdsdebug is defined as

  pr_debug("%s(): " fmt, __func__ , ##args)

Hence it doesn't make sense to include the name of the calling
function explicitly in the format string passed to rdsdebug.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

11ac1199

net: openvswitch: Support masked set actions. · 83d2b9ba

由 Jarno Rajahalme 提交于 2月 05, 2015

OVS userspace already probes the openvswitch kernel module for
OVS_ACTION_ATTR_SET_MASKED support.  This patch adds the kernel module
implementation of masked set actions.

The existing set action sets many fields at once.  When only a subset
of the IP header fields, for example, should be modified, all the IP
fields need to be exact matched so that the other field values can be
copied to the set action.  A masked set action allows modification of
an arbitrary subset of the supported header bits without requiring the
rest to be matched.

Masked set action is now supported for all writeable key types, except
for the tunnel key.  The set tunnel action is an exception as any
input tunnel info is cleared before action processing starts, so there
is no tunnel info to mask.

The kernel module converts all (non-tunnel) set actions to masked set
actions.  This makes action processing more uniform, and results in
less branching and duplicating the action processing code.  When
returning actions to userspace, the fully masked set actions are
converted back to normal set actions.  We use a kernel internal action
code to be able to tell the userspace provided and converted masked
set actions apart.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83d2b9ba

06 2月, 2015 9 次提交

tipc: eliminate race condition at multicast reception · cb1b7280

由 Jon Paul Maloy 提交于 2月 05, 2015

In a previous commit in this series we resolved a race problem during
unicast message reception.

Here, we resolve the same problem at multicast reception. We apply the
same technique: an input queue serializing the delivery of arriving
buffers. The main difference is that here we do it in two steps.
First, the broadcast link feeds arriving buffers into the tail of an
arrival queue, which head is consumed at the socket level, and where
destination lookup is performed. Second, if the lookup is successful,
the resulting buffer clones are fed into a second queue, the input
queue. This queue is consumed at reception in the socket just like
in the unicast case. Both queues are protected by the same lock, -the
one of the input queue.
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cb1b7280

tipc: simplify socket multicast reception · 3c724acd

由 Jon Paul Maloy 提交于 2月 05, 2015

The structure 'tipc_port_list' is used to collect port numbers
representing multicast destination socket on a receiving node.
The list is not based on a standard linked list, and is in reality
optimized for the uncommon case that there are more than one
multicast destinations per node. This makes the list handling
unecessarily complex, and as a consequence, even the socket
multicast reception becomes more complex.

In this commit, we replace 'tipc_port_list' with a new 'struct
tipc_plist', which is based on a standard list. We give the new
list stack (push/pop) semantics, someting that simplifies
the implementation of the function tipc_sk_mcast_rcv().
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c724acd

tipc: simplify connection abort notifications when links break · 708ac32c

由 Jon Paul Maloy 提交于 2月 05, 2015

The new input message queue in struct tipc_link can be used for
delivering connection abort messages to subscribing sockets. This
makes it possible to simplify the code for such cases.

This commit removes the temporary list in tipc_node_unlock()
used for transforming abort subscriptions to messages. Instead, the
abort messages are now created at the moment of lost contact, and
then added to the last failed link's generic input queue for delivery
to the sockets concerned.
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

708ac32c

tipc: resolve race problem at unicast message reception · c637c103

由 Jon Paul Maloy 提交于 2月 05, 2015

TIPC handles message cardinality and sequencing at the link layer,
before passing messages upwards to the destination sockets. During the
upcall from link to socket no locks are held. It is therefore possible,
and we see it happen occasionally, that messages arriving in different
threads and delivered in sequence still bypass each other before they
reach the destination socket. This must not happen, since it violates
the sequentiality guarantee.

We solve this by adding a new input buffer queue to the link structure.
Arriving messages are added safely to the tail of that queue by the
link, while the head of the queue is consumed, also safely, by the
receiving socket. Sequentiality is secured per socket by only allowing
buffers to be dequeued inside the socket lock. Since there may be multiple
simultaneous readers of the queue, we use a 'filter' parameter to reduce
the risk that they peek the same buffer from the queue, hence also
reducing the risk of contention on the receiving socket locks.

This solves the sequentiality problem, and seems to cause no measurable
performance degradation.

A nice side effect of this change is that lock handling in the functions
tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
will enable future simplifications of those functions.
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c637c103

tipc: use existing sk_write_queue for outgoing packet chain · 94153e36

由 Jon Paul Maloy 提交于 2月 05, 2015

The list for outgoing traffic buffers from a socket is currently
allocated on the stack. This forces us to initialize the queue for
each sent message, something costing extra CPU cycles in the most
critical data path. Later in this series we will introduce a new
safe input buffer queue, something that would force us to initialize
even the spinlock of the outgoing queue. A closer analysis reveals
that the queue always is filled and emptied within the same lock_sock()
session. It is therefore safe to use a queue aggregated in the socket
itself for this purpose. Since there already exists a queue for this
in struct sock, sk_write_queue, we introduce use of that queue in
this commit.
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94153e36

tipc: split up function tipc_msg_eval() · e3a77561

由 Jon Paul Maloy 提交于 2月 05, 2015

The function tipc_msg_eval() is in reality doing two related, but
different tasks. First it tries to find a new destination for named
messages, in case there was no first lookup, or if the first lookup
failed. Second, it does what its name suggests, evaluating the validity
of the message and its destination, and returning an appropriate error
code depending on the result.

This is confusing, and in this commit we choose to break it up into two
functions. A new function, tipc_msg_lookup_dest(), first attempts to find
a new destination, if the message is of the right type. If this lookup
fails, or if the message should not be subject to a second lookup, the
already existing tipc_msg_reverse() is called. This function performs
prepares the message for rejection, if applicable.
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3a77561

tipc: enqueue arrived buffers in socket in separate function · d570d864

由 Jon Paul Maloy 提交于 2月 05, 2015

The code for enqueuing arriving buffers in the function tipc_sk_rcv()
contains long code lines and currently goes to two indentation levels.
As a cosmetic preparaton for the next commits, we break it out into
a separate function.
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d570d864

tipc: simplify message forwarding and rejection in socket layer · 1186adf7

由 Jon Paul Maloy 提交于 2月 05, 2015

Despite recent improvements, the handling of error codes and return
values at reception of messages in the socket layer is still confusing.

In this commit, we try to make it more comprehensible. First, we
separate between the return values coming from the functions called
by tipc_sk_rcv(), -those are TIPC specific error codes, and the
return values returned by tipc_sk_rcv() itself. Second, we don't
use the returned TIPC error code as indication for whether a buffer
should be forwarded/rejected or not; instead we use the buffer pointer
passed along with filter_msg(). This separation is necessary because
we sometimes want to forward messages even when there is no error
(i.e., protocol messages and successfully secondary looked up data
messages).
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1186adf7

tipc: reduce usage of context info in socket and link · c5898636

由 Jon Paul Maloy 提交于 2月 05, 2015

The most common usage of namespace information is when we fetch the
own node addess from the net structure. This leads to a lot of
passing around of a parameter of type 'struct net *' between
functions just to make them able to obtain this address.

However, in many cases this is unnecessary. The own node address
is readily available as a member of both struct tipc_sock and
tipc_link, and can be fetched from there instead.
The fact that the vast majority of functions in socket.c and link.c
anyway are maintaining a pointer to their respective base structures
makes this option even more compelling.

In this commit, we introduce the inline functions tsk_own_node()
and link_own_node() to make it easy for functions to fetch the node
address from those structs instead of having to pass along and
dereference the namespace struct.

In particular, we make calls to the msg_xx() functions in msg.{h,c}
context independent by directly passing them the own node address
as parameter when needed. Those functions should be regarded as
leaves in the code dependency tree, and it is hence desirable to
keep them namspace unaware.

Apart from a potential positive effect on cache behavior, these
changes make it easier to introduce the changes that will follow
later in this series.
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5898636

05 2月, 2015 19 次提交

sit: fix some __be16/u16 mismatches · a409caec

由 Eric Dumazet 提交于 2月 04, 2015

Fixes following sparse warnings :

net/ipv6/sit.c:1509:32: warning: incorrect type in assignment (different base types)
net/ipv6/sit.c:1509:32: expected restricted __be16 [usertype] sport
net/ipv6/sit.c:1509:32: got unsigned short
net/ipv6/sit.c:1514:32: warning: incorrect type in assignment (different base types)
net/ipv6/sit.c:1514:32: expected restricted __be16 [usertype] dport
net/ipv6/sit.c:1514:32: got unsigned short
net/ipv6/sit.c:1711:38: warning: incorrect type in argument 3 (different base types)
net/ipv6/sit.c:1711:38: expected unsigned short [unsigned] [usertype] value
net/ipv6/sit.c:1711:38: got restricted __be16 [usertype] sport
net/ipv6/sit.c:1713:38: warning: incorrect type in argument 3 (different base types)
net/ipv6/sit.c:1713:38: expected unsigned short [unsigned] [usertype] value
net/ipv6/sit.c:1713:38: got restricted __be16 [usertype] dport
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a409caec

net: remove some sparse warnings · 2ce1ee17

由 Eric Dumazet 提交于 2月 04, 2015

netdev_adjacent_add_links() and netdev_adjacent_del_links()
are static.

queue->qdisc has __rcu annotation, need to use RCU_INIT_POINTER()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ce1ee17

ip6_gre: fix endianness errors in ip6gre_err · d1e158e2

由 Sabrina Dubroca 提交于 2月 04, 2015

info is in network byte order, change it back to host byte order
before use. In particular, the current code sets the MTU of the tunnel
to a wrong (too big) value.

Fixes: c12b395a ("gre: Support GRE over IPv6")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1e158e2

Revert "bridge: Let bridge not age 'externally' learnt FDB entries, they are... · 3fcf9011

由 David S. Miller 提交于 2月 04, 2015

Revert "bridge: Let bridge not age 'externally' learnt FDB entries, they are removed when 'external' entity notifies the aging"

This reverts commit 9a05dde5.

Requested by Scott Feldman.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3fcf9011

pkt_sched: fq: better control of DDOS traffic · 06eb395f

由 Eric Dumazet 提交于 2月 04, 2015

FQ has a fast path for skb attached to a socket, as it does not
have to compute a flow hash. But for other packets, FQ being non
stochastic means that hosts exposed to random Internet traffic
can allocate million of flows structure (104 bytes each) pretty
easily. Not only host can OOM, but lookup in RB trees can take
too much cpu and memory resources.

This patch adds a new attribute, orphan_mask, that is adding
possibility of having a stochastic hash for orphaned skb.

Its default value is 1024 slots, to mimic SFQ behavior.

Note: This does not apply to locally generated TCP traffic,
and no locally generated traffic will share a flow structure
with another perfect or stochastic flow.

This patch also handles the specific case of SYNACK messages:

They are attached to the listener socket, and therefore all map
to a single hash bucket. If listener have set SO_MAX_PACING_RATE,
hoping to have new accepted socket inherit this rate, SYNACK
might be paced and even dropped.

This is very similar to an internal patch Google have used more
than one year.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06eb395f

tcp: do not pace pure ack packets · 98781965

由 Eric Dumazet 提交于 2月 03, 2015

When we added pacing to TCP, we decided to let sch_fq take care
of actual pacing.

All TCP had to do was to compute sk->pacing_rate using simple formula:

sk->pacing_rate = 2 * cwnd * mss / rtt

It works well for senders (bulk flows), but not very well for receivers
or even RPC :

cwnd on the receiver can be less than 10, rtt can be around 100ms, so we
can end up pacing ACK packets, slowing down the sender.

Really, only the sender should pace, according to its own logic.

Instead of adding a new bit in skb, or call yet another flow
dissection, we tweak skb->truesize to a small value (2), and
we instruct sch_fq to use new helper and not pace pure ack.

Note this also helps TCP small queue, as ack packets present
in qdisc/NIC do not prevent sending a data packet (RPC workload)

This helps to reduce tx completion overhead, ack packets can use regular
sock_wfree() instead of tcp_wfree() which is a bit more expensive.

This has no impact in the case packets are sent to loopback interface,
as we do not coalesce ack packets (were we would detect skb->truesize
lie)

In case netem (with a delay) is used, skb_orphan_partial() also sets
skb->truesize to 1.

This patch is a combination of two patches we used for about one year at
Google.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

98781965

netfilter: Use rhashtable walk iterator · 9a776628

由 Herbert Xu 提交于 2月 04, 2015

This patch gets rid of the manual rhashtable walk in nft_hash
which touches rhashtable internals that should not be exposed.
It does so by using the rhashtable iterator primitives.

Note that I'm leaving nft_hash_destroy alone since it's only
invoked on shutdown and it shouldn't be affected by changes
to rhashtable internals (or at least not what I'm planning to
change).
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a776628

netlink: Use rhashtable walk iterator · 56d28b1e

由 Herbert Xu 提交于 2月 04, 2015

This patch gets rid of the manual rhashtable walk in netlink
which touches rhashtable internals that should not be exposed.
It does so by using the rhashtable iterator primitives.

In fact the existing code was very buggy.  Some sockets weren't
shown at all while others were shown more than once.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56d28b1e

cls_api.c: Fix dumping of non-existing actions' stats. · b057df24

由 Ignacy Gawędzki 提交于 2月 03, 2015

In tcf_exts_dump_stats(), ensure that exts->actions is not empty before
accessing the first element of that list and calling tcf_action_copy_stats()
on it. This fixes some random segvs when adding filters of type "basic" with
no particular action.

This also fixes the dumping of those "no-action" filters, which more often
than not made calls to tcf_action_copy_stats() fail and consequently netlink
attributes added by the caller to be removed by a call to nla_nest_cancel().

Fixes: 33be6271 ("net_sched: act: use standard struct list_head")
Signed-off-by: NIgnacy Gawędzki <ignacy.gawedzki@green-communications.fr>
Acked-by: NCong Wang <cwang@twopensource.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b057df24

pkt_sched: fq: avoid hang when quantum 0 · 3725a269

由 Kenneth Klette Jonassen 提交于 2月 03, 2015

Configuring fq with quantum 0 hangs the system, presumably because of a
non-interruptible infinite loop. Either way quantum 0 does not make sense.

Reproduce with:
sudo tc qdisc add dev lo root fq quantum 0 initial_quantum 0
ping 127.0.0.1
Signed-off-by: NKenneth Klette Jonassen <kennetkl@ifi.uio.no>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3725a269

net/core: Add event for a change in slave state · 61bd3857

由 Moni Shoua 提交于 2月 03, 2015

Add event which provides an indication on a change in the state
of a bonding slave. The event handler should cast the pointer to the
appropriate type (struct netdev_bonding_info) in order to get the
full info about the slave.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61bd3857

tipc: separate link starting event from link timeout event · af9946fd

由 Jon Paul Maloy 提交于 2月 03, 2015

When a new link instance is created, it is trigged to start by
sending it a TIPC_STARTING_EVT, whereafter a regular link
reset is applied to it.

The starting event is codewise treated as a timeout event, and prompts
a link RESET message to be sent to the peer node, carrying a link
session identifier. The later link_reset() call nudges this session
identifier, whereafter all subsequent RESET messages will be sent out
with the new identifier. The latter session number overrides the former,
causing the peer to unconditionally accept it irrespective of its
current working state.

We don't think that this causes any problem, but it is not in accordance
with the protocol spec, and may cause confusion when debugging TIPC
sessions.

To avoid this, we make the starting event distinct from the
subsequent timeout events, by not allowing the former to send
out any RESET message. This eliminates the described problem.
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af9946fd

tipc: eliminate race during node creation · b45db71b

由 Jon Paul Maloy 提交于 2月 03, 2015

Instances of struct node are created in the function tipc_disc_rcv()
under the assumption that there is no race between received discovery
messages arriving from the same node. This assumption is wrong.
When we use more than one bearer, it is possible that discovery
messages from the same node arrive at the same moment, resulting in
creation of two instances of struct tipc_node. This may later cause
confusion during link establishment, and may result in one of the links
never becoming activated.

We fix this by making lookup and potential creation of nodes atomic.
Instead of first looking up the node, and in case of failure, create it,
we now start with looking up the node inside node_link_create(), and
return a reference to that one if found. Otherwise, we go ahead and
create the node as we did before.
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b45db71b

tipc: avoid stale link after aborted failover · 7d24dcdb

由 Jon Paul Maloy 提交于 2月 03, 2015

During link failover it may happen that the remaining link goes
down while it is still in the process of taking over traffic
from a previously failed link. When this happens, we currently
abort the failover procedure and reset the first failed link to
non-failover mode, so that it will be ready to re-establish
contact with its peer when it comes available.

However, if the first link goes down because its bearer was manually
disabled, it is not enough to reset it; it must also be deleted;
which is supposed to happen when the failover procedure is finished.
Otherwise it will remain a zombie link: attached to the owner node
structure, in mode LINK_STOPPED, and permanently blocking any re-
establishing of the link to the peer via the interface in question.

We fix this by amending the failover abort procedure. Apart from
resetting the link to non-failover state, we test if the link is
also in LINK_STOPPED mode. If so, we delete it, using the conditional
tipc_link_delete() function introduced in the previous commit.
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d24dcdb

tipc: add reference count to struct tipc_link · 2d72d495

由 Jon Paul Maloy 提交于 2月 03, 2015

When a bearer is disabled, all pertaining links will be reset and
deleted. However, if there is a second active link towards a killed
link's destination, the delete has to be postponed until the failover
is finished. During this interval, we currently put the link in zombie
mode, i.e., we take it out of traffic, delete its timer, but leave it
attached to the owner node structure until all missing packets have
been received. When this is done, we detach the link from its node
and delete it, assuming that the synchronous timer deletion that was
initiated earlier in a different thread has finished.

This is unsafe, as the failover may finish before del_timer_sync()
has returned in the other thread.

We fix this by adding an atomic reference counter of type kref in
struct tipc_link. The counter keeps track of the references kept
to the link by the owner node and the timer. We then do a conditional
delete, based on the reference counter, both after the failover has
been finished and when the timer expires, if applicable. Whoever
comes last, will actually delete the link. This approach also implies
that we can make the deletion of the timer asynchronous.
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d72d495

net: rds: use correct size for max unacked packets and bytes · db27ebb1

由 Sasha Levin 提交于 2月 03, 2015

Max unacked packets/bytes is an int while sizeof(long) was used in the
sysctl table.

This means that when they were getting read we'd also leak kernel memory
to userspace along with the timeout values.
Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db27ebb1

net: add skb functions to process remote checksum offload · dcdc8994

由 Tom Herbert 提交于 2月 02, 2015

This patch adds skb_remcsum_process and skb_gro_remcsum_process to
perform the appropriate adjustments to the skb when receiving
remote checksum offload.

Updated vxlan and gue to use these functions.

Tested: Ran TCP_RR and TCP_STREAM netperf for VXLAN and GUE, did
not see any change in performance.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dcdc8994

bridge: Let bridge not age 'externally' learnt FDB entries, they are removed... · 9a05dde5

由 Siva Mannem 提交于 2月 02, 2015

bridge: Let bridge not age 'externally' learnt FDB entries, they are removed when 'external' entity notifies the aging

When 'learned_sync' flag is turned on, the offloaded switch
 port syncs learned MAC addresses to bridge's FDB via switchdev notifier
 (NETDEV_SWITCH_FDB_ADD). Currently, FDB entries learnt via this mechanism are
 wrongly being deleted by bridge aging logic. This patch ensures that FDB
 entries synced from offloaded switch ports are not deleted by bridging logic.
 Such entries can only be deleted via switchdev notifier
 (NETDEV_SWITCH_FDB_DEL).
Signed-off-by: NSiva Mannem <siva.mannem.lnx@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a05dde5

xps: fix xps for stacked devices · 2bd82484

由 Eric Dumazet 提交于 2月 03, 2015

A typical qdisc setup is the following :

bond0 : bonding device, using HTB hierarchy
eth1/eth2 : slaves, multiqueue NIC, using MQ + FQ qdisc

XPS allows to spread packets on specific tx queues, based on the cpu
doing the send.

Problem is that dequeues from bond0 qdisc can happen on random cpus,
due to the fact that qdisc_run() can dequeue a batch of packets.

CPUA -> queue packet P1 on bond0 qdisc, P1->ooo_okay=1
CPUA -> queue packet P2 on bond0 qdisc, P2->ooo_okay=0

CPUB -> dequeue packet P1 from bond0
        enqueue packet on eth1/eth2
CPUC -> dequeue packet P2 from bond0
        enqueue packet on eth1/eth2 using sk cache (ooo_okay is 0)

get_xps_queue() then might select wrong queue for P1, since current cpu
might be different than CPUA.

P2 might be sent on the old queue (stored in sk->sk_tx_queue_mapping),
if CPUC runs a bit faster (or CPUB spins a bit on qdisc lock)

Effect of this bug is TCP reorders, and more generally not optimal
TX queue placement. (A victim bulk flow can be migrated to the wrong TX
queue for a while)

To fix this, we have to record sender cpu number the first time
dev_queue_xmit() is called for one tx skb.

We can union napi_id (used on receive path) and sender_cpu,
granted we clear sender_cpu in skb_scrub_packet() (credit to Willem for
this union idea)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2bd82484

04 2月, 2015 9 次提交

NFC: nci: Move NFCEE discovery logic · fa00e8fe

由 Christophe Ricard 提交于 2月 03, 2015

NFCEE_DISCOVER_CMD is a specified NCI command used to discover
NFCEE IDs.
Move nci_nfcee_discover() call to nci_discover_se() in order to
guarantee:
- NFCEE_DISCOVER_CMD run when the NCI state machine is initialized
- NFCEE_DISCOVER_CMD is not run in case there is not discover_se
  hook defined by a NFC device driver.
Signed-off-by: NChristophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>

fa00e8fe

NFC: nci: Move logical connection structure allocation · 15d4a8da

由 Christophe Ricard 提交于 2月 03, 2015

conn_info is currently allocated only after nfcee_discovery_ntf
which is not generic enough for logical connection other than
NFCEE. The corresponding conn_info is now created in
nci_core_conn_create_rsp().
Signed-off-by: NChristophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>

15d4a8da

NFC: nci: Change credits field to credits_cnt · 3ba5c846

由 Christophe Ricard 提交于 2月 03, 2015

For consistency sake change nci_core_conn_create_rsp structure
credits field to credits_cnt.
Signed-off-by: NChristophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>

3ba5c846

NFC: nci: Support all destinations type when creating a connection · b16ae716

由 Christophe Ricard 提交于 2月 03, 2015

The current implementation limits nci_core_conn_create_req()
to only manage NCI_DESTINATION_NFCEE.
Add new parameters to nci_core_conn_create() to support all
destination types described in the NCI specification.
Because there are some parameters with variable size dynamic
buffer allocation is needed.
Signed-off-by: NChristophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>

b16ae716

NFC: nci: Add reference to the RF logical connection · 12bdf27d

由 Christophe Ricard 提交于 2月 03, 2015

The NCI_STATIC_RF_CONN_ID logical connection is the most used
connection. Keeping it directly accessible in the nci_dev
structure will simplify and optimize the access.
Signed-off-by: NChristophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>

12bdf27d

ipv6: Select fragment id during UFO segmentation if not set. · 0508c07f

由 Vlad Yasevich 提交于 2月 03, 2015

If the IPv6 fragment id has not been set and we perform
fragmentation due to UFO, select a new fragment id.
We now consider a fragment id of 0 as unset and if id selection
process returns 0 (after all the pertrubations), we set it to
0x80000000, thus giving us ample space not to create collisions
with the next packet we may have to fragment.

When doing UFO integrity checking, we also select the
fragment id if it has not be set yet.   This is stored into
the skb_shinfo() thus allowing UFO to function correclty.

This patch also removes duplicate fragment id generation code
and moves ipv6_select_ident() into the header as it may be
used during GSO.
Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0508c07f

A
net: switch sockets to ->read_iter/->write_iter · 8ae5e030
由 Al Viro 提交于 11月 28, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
8ae5e030
A
net/socket.c: fold do_sock_{read,write} into callers · 6d652330
由 Al Viro 提交于 1月 30, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
6d652330
A
net: bury net/core/iovec.c - nothing in there is used anymore · 31a25fae
由 Al Viro 提交于 11月 28, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
31a25fae

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功