提交 · 4bac50bace0377138fed2ac3627c1ad470ea1eca · openeuler / Kernel

06 10月, 2015 1 次提交

ipv4: Fix compilation errors in fib_rebalance · 0a837fe4

由 Peter Nørlund 提交于 10月 06, 2015

This fixes

net/built-in.o: In function `fib_rebalance':
fib_semantics.c:(.text+0x9df14): undefined reference to `__divdi3'

and

net/built-in.o: In function `fib_rebalance':
net/ipv4/fib_semantics.c:572: undefined reference to `__aeabi_ldivmod'

Fixes: 0e884c78 ("ipv4: L3 hash-based multipath")
Signed-off-by: NPeter Nørlund <pch@ordbogen.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a837fe4

05 10月, 2015 38 次提交

bpf, seccomp: prepare for upcoming criu support · bab18991

由 Daniel Borkmann 提交于 10月 02, 2015

The current ongoing effort to dump existing cBPF seccomp filters back
to user space requires to hold the pre-transformed instructions like
we do in case of socket filters from sk_attach_filter() side, so they
can be reloaded in original form at a later point in time by utilities
such as criu.

To prepare for this, simply extend the bpf_prog_create_from_user()
API to hold a flag that tells whether we should store the original
or not. Also, fanout filters could make use of that in future for
things like diag. While fanout filters already use bpf_prog_destroy(),
move seccomp over to them as well to handle original programs when
present.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Tycho Andersen <tycho.andersen@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Tested-by: NTycho Andersen <tycho.andersen@canonical.com>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bab18991

RDS-TCP: Set up MSG_MORE and MSG_SENDPAGE_NOTLAST as appropriate in rds_tcp_xmit · 76b29ef1

由 Sowmini Varadhan 提交于 9月 30, 2015

For the same reasons as commit 2f533844 ("tcp: allow splice() to
build full TSO packets") and commit 35f9c09f ("tcp: tcp_sendpages()
should call tcp_push() once"), rds_tcp_xmit may have multiple pages to
send, so use the MSG_MORE and MSG_SENDPAGE_NOTLAST as hints to
tcp_sendpage()
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76b29ef1

RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune · 1edd6a14

由 Sowmini Varadhan 提交于 9月 30, 2015

Using the value of RDS_TCP_DEFAULT_BUFSIZE (128K)
clobbers efficient use of TSO because it inflates the size_goal
that is computed in tcp_sendmsg/tcp_sendpage and skews packet
latency, and the default values for these parameters actually
results in significantly better performance.

In request-response tests using rds-stress with a packet size of
100K with 16 threads (test parameters -q 100000 -a 256 -t16 -d16)
between a single pair of IP addresses achieves a throughput of
6-8 Gbps. Without this patch, throughput maxes at 2-3 Gbps under
equivalent conditions on these platforms.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1edd6a14

RDS: Use a single TCP socket for both send and receive. · 3b20fc38

由 Sowmini Varadhan 提交于 9月 30, 2015

Commit f711a6ae ("net/rds: RDS-TCP: Always create a new rds_sock
for an incoming connection.") modified rds-tcp so that an incoming SYN
would ignore an existing "client" TCP connection which had the local
port set to the transient port. The motivation for ignoring the existing
"client" connection in f711a6ae was to avoid race conditions and an
endless duel of reconnect attempts triggered by a restart/abort of one
of the nodes in the TCP connection.

However, having separate sockets for active and passive sides
is avoidable, and the simpler model of a single TCP socket for
both send and receives of all RDS connections associated with
that tcp socket makes for easier observability. We avoid the race
conditions from f711a6ae by attempting reconnects in rds_conn_shutdown
if, and only if, the (new) c_outgoing bit is set for RDS_TRANS_TCP.
The c_outgoing bit is initialized in __rds_conn_create().

A side-effect of re-using the client rds_connection for an incoming
SYN is the potential of encountering duelling SYNs, i.e., we
have an outgoing RDS_CONN_CONNECTING socket when we get the incoming
SYN. The logic to arbitrate this criss-crossing SYN exchange in
rds_tcp_accept_one() has been modified to emulate the BGP state
machine: the smaller IP address should back off from the connection attempt.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b20fc38

tcp: restore fastopen operations · ac8cfc7b

由 Eric Dumazet 提交于 9月 30, 2015

I accidentally cleared fastopenq.max_qlen in reqsk_queue_alloc()
while max_qlen can be set before listen() is called,
using TCP_FASTOPEN socket option for example.

Fixes: 0536fcc0 ("tcp: prepare fastopen code for upcoming listener changes")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac8cfc7b

net: sctp: avoid incorrect time_t use · 3ef0a25b

由 Arnd Bergmann 提交于 9月 30, 2015

We want to avoid using time_t in the kernel because of the y2038
overflow problem. The use in sctp is not for storing seconds at
all, but instead uses microseconds and is passed as 32-bit
on all machines.

This patch changes the type to u32, which better fits the use.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: linux-sctp@vger.kernel.org
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ef0a25b

ipv6: use ktime_t for internal timestamps · 3dd7669f

由 Arnd Bergmann 提交于 9月 30, 2015

The ipv6 mip6 implementation is one of only a few users of the
skb_get_timestamp() function in the kernel, which is both unsafe
on 32-bit architectures because of the 2038 overflow, and slightly
less efficient than the skb_get_ktime() based approach.

This converts the function call and the mip6_report_rate_limiter
structure that stores the time stamp, eliminating all uses of
timeval in the ipv6 code.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3dd7669f

nfnetlink: use y2038 safe timestamp · f6389ecb

由 Arnd Bergmann 提交于 9月 30, 2015

The __build_packet_message function fills a nfulnl_msg_packet_timestamp
structure that uses 64-bit seconds and is therefore y2038 safe, but
it uses an intermediate 'struct timespec' which is not.

This trivially changes the code to use 'struct timespec64' instead,
to correct the result on 32-bit architectures.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6389ecb

mac80211: use ktime_get_seconds · 84b00607

由 Arnd Bergmann 提交于 9月 30, 2015

The mac80211 code uses ktime_get_ts to measure the connected time.
As this uses monotonic time, it is y2038 safe on 32-bit systems,
but we still want to deprecate the use of 'timespec' because most
other users are broken.

This changes the code to use ktime_get_seconds() instead, which
avoids the timespec structure and is slightly more efficient.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: linux-wireless@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

84b00607

ipv4: ICMP packet inspection for multipath · 79a13159

由 Peter Nørlund 提交于 9月 30, 2015

ICMP packets are inspected to let them route together with the flow they
belong to, minimizing the chance that a problematic path will affect flows
on other paths, and so that anycast environments can work with ECMP.
Signed-off-by: NPeter Nørlund <pch@ordbogen.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79a13159

ipv4: L3 hash-based multipath · 0e884c78

由 Peter Nørlund 提交于 9月 30, 2015

Replaces the per-packet multipath with a hash-based multipath using
source and destination address.
Signed-off-by: NPeter Nørlund <pch@ordbogen.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e884c78

tcp: avoid two atomic ops for syncookies · a1a5344d

由 Eric Dumazet 提交于 10月 04, 2015

inet_reqsk_alloc() is used to allocate a temporary request
in order to generate a SYNACK with a cookie. Then later,
syncookie validation also uses a temporary request.

These paths already took a reference on listener refcount,
we can avoid a couple of atomic operations.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a1a5344d

net: use sk_fullsock() in __netdev_pick_tx() · 004a5d01

由 Eric Dumazet 提交于 10月 04, 2015

SYN_RECV & TIMEWAIT sockets are not full blown, they do not have a
sk_dst_cache pointer.

Fixes: ca6fb065 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

004a5d01

tcp: fix fastopen races vs lockless listener · 7656d842

由 Eric Dumazet 提交于 10月 04, 2015

There are multiple races that need fixes :

1) skb_get() + queue skb + kfree_skb() is racy

An accept() can be done on another cpu, data consumed immediately.
tcp_recvmsg() uses __kfree_skb() as it is assumed all skb found in
socket receive queue are private.

Then the kfree_skb() in tcp_rcv_state_process() uses an already freed skb

2) tcp_reqsk_record_syn() needs to be done before tcp_try_fastopen()
for the same reasons.

3) We want to send the SYNACK before queueing child into accept queue,
otherwise we might reintroduce the ooo issue fixed in
commit 7c85af88 ("tcp: avoid reorders for TFO passive connections")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7656d842

bridge: netlink: add support for default_pvid · 0f963b75

由 Nikolay Aleksandrov 提交于 10月 04, 2015

Add IFLA_BR_VLAN_DEFAULT_PVID to allow setting/getting bridge's
default_pvid via netlink.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f963b75

bridge: netlink: add support for netfilter tables config · 93870cc0

由 Nikolay Aleksandrov 提交于 10月 04, 2015

Add support to allow getting/setting netfilter tables settings.
Currently these are IFLA_BR_NF_CALL_IPTABLES, IFLA_BR_NF_CALL_IP6TABLES
and IFLA_BR_NF_CALL_ARPTABLES.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93870cc0

bridge: netlink: add support for igmp's intervals · 7e4df51e

由 Nikolay Aleksandrov 提交于 10月 04, 2015

Add support to set/get all of the igmp's configurable intervals via
netlink. These currently are:
IFLA_BR_MCAST_LAST_MEMBER_INTVL
IFLA_BR_MCAST_MEMBERSHIP_INTVL
IFLA_BR_MCAST_QUERIER_INTVL
IFLA_BR_MCAST_QUERY_INTVL
IFLA_BR_MCAST_QUERY_RESPONSE_INTVL
IFLA_BR_MCAST_STARTUP_QUERY_INTVL
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e4df51e

bridge: netlink: add support for multicast_startup_query_count · b89e6bab

由 Nikolay Aleksandrov 提交于 10月 04, 2015

Add IFLA_BR_MCAST_STARTUP_QUERY_CNT to allow setting/getting
br->multicast_startup_query_count via netlink. Also align the ifla
comments.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b89e6bab

bridge: netlink: add support for multicast_last_member_count · 79b859f5

由 Nikolay Aleksandrov 提交于 10月 04, 2015

Add IFLA_BR_MCAST_LAST_MEMBER_CNT to allow setting/getting
br->multicast_last_member_count via netlink.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79b859f5

bridge: netlink: add support for igmp's hash_max · 858079fd

由 Nikolay Aleksandrov 提交于 10月 04, 2015

Add IFLA_BR_MCAST_HASH_MAX to allow setting/getting br->hash_max via
netlink.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

858079fd

bridge: netlink: add support for igmp's hash_elasticity · 431db3c0

由 Nikolay Aleksandrov 提交于 10月 04, 2015

Add IFLA_BR_MCAST_HASH_ELASTICITY to allow setting/getting
br->hash_elasticity via netlink.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

431db3c0

bridge: netlink: add support for multicast_querier · ba062d7c

由 Nikolay Aleksandrov 提交于 10月 04, 2015

Add IFLA_BR_MCAST_QUERIER to allow setting/getting br->multicast_querier
via netlink.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba062d7c

bridge: netlink: add support for multicast_query_use_ifaddr · 295141d9

由 Nikolay Aleksandrov 提交于 10月 04, 2015

Add IFLA_BR_MCAST_QUERY_USE_IFADDR to allow setting/getting
br->multicast_query_use_ifaddr via netlink.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

295141d9

bridge: netlink: add support for multicast_snooping · 89126327

由 Nikolay Aleksandrov 提交于 10月 04, 2015

Add IFLA_BR_MCAST_SNOOPING to allow enabling/disabling multicast
snooping via netlink.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89126327

bridge: netlink: add support for multicast_router · a9a6bc70

由 Nikolay Aleksandrov 提交于 10月 04, 2015

Add IFLA_BR_MCAST_ROUTER to allow setting and retrieving
br->multicast_router when igmp snooping is enabled.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a9a6bc70

bridge: netlink: add fdb flush · 150217c6

由 Nikolay Aleksandrov 提交于 10月 04, 2015

Simple attribute that flushes the bridge's fdb.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

150217c6

bridge: netlink: add group_addr support · 111189ab

由 Nikolay Aleksandrov 提交于 10月 04, 2015

Add IFLA_BR_GROUP_ADDR attribute to allow setting and retrieving the
group_addr via netlink.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

111189ab

bridge: netlink: export all timers · d76bd14e

由 Nikolay Aleksandrov 提交于 10月 04, 2015

Export the following bridge timers (also exported via sysfs):
IFLA_BR_HELLO_TIMER, IFLA_BR_TCN_TIMER, IFLA_BR_TOPOLOGY_CHANGE_TIMER,
IFLA_BR_GC_TIMER via netlink.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d76bd14e

bridge: netlink: export topology_change and topology_change_detected · ed416309

由 Nikolay Aleksandrov 提交于 10月 04, 2015

Add IFLA_BR_TOPOLOGY_CHANGE and IFLA_BR_TOPOLOGY_CHANGE_DETECTED and
export them via netlink.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ed416309

bridge: netlink: export root path cost · 684dd248

由 Nikolay Aleksandrov 提交于 10月 04, 2015

Add IFLA_BR_ROOT_PATH_COST and export it via netlink.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

684dd248

bridge: netlink: export root port · 8762ba68

由 Nikolay Aleksandrov 提交于 10月 04, 2015

Add IFLA_BR_ROOT_PORT and export it via netlink.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8762ba68

bridge: netlink: export bridge id · 7599a220

由 Nikolay Aleksandrov 提交于 10月 04, 2015

Add IFLA_BR_BRIDGE_ID and export br->bridge_id via netlink.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7599a220

bridge: netlink: export root id · 5127c81f

由 Nikolay Aleksandrov 提交于 10月 04, 2015

Add IFLA_BR_ROOT_ID and export br->designated_root via netlink. For this
purpose add struct ifla_bridge_id that would represent struct bridge_id.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5127c81f

bridge: netlink: add group_fwd_mask support · 7910228b

由 Nikolay Aleksandrov 提交于 10月 04, 2015

Add IFLA_BR_GROUP_FWD_MASK attribute to allow setting and retrieving the
group_fwd_mask via netlink.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7910228b

bridge: vlan: use br_vlan_should_use to simplify __vlan_add/del · 6be144f6

由 Nikolay Aleksandrov 提交于 10月 02, 2015

The checks that lead to num_vlans change are always what
br_vlan_should_use checks for, namely if the vlan is only a context or
not and depending on that it's either not counted or counted
as a real/used vlan respectively.
Also give better explanation in br_vlan_should_use's comment.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6be144f6

bridge: vlan: drop master_flags from __vlan_add · 2ffdf508

由 Nikolay Aleksandrov 提交于 10月 02, 2015

There's only one user now and we can include the flag directly.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ffdf508

bridge: vlan: use br_vlan_(get|put)_master to deal with refcounts · f8ed289f

由 Nikolay Aleksandrov 提交于 10月 02, 2015

Introduce br_vlan_(get|put)_master which take a reference (or create the
master vlan first if it didn't exist) and drop a reference respectively.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8ed289f

bridge: vlan: use rcu list for the ordered vlan list · 586c2b57

由 Nikolay Aleksandrov 提交于 10月 02, 2015

When I did the conversion to rhashtable I missed the required locking of
one important user of the vlan list - br_get_link_af_size_filtered()
which is called:
br_ifinfo_notify() -> br_nlmsg_size() -> br_get_link_af_size_filtered()
and the notifications can be sent without holding rtnl. Before this
conversion the function relied on using rcu and since we already use rcu to
destroy the vlans, we can simply migrate the list to use the rcu helpers.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

586c2b57

04 10月, 2015 1 次提交

tcp/dccp: add SLAB_DESTROY_BY_RCU flag for request sockets · e96f78ab

由 Eric Dumazet 提交于 10月 03, 2015

Before letting request sockets being put in TCP/DCCP regular
ehash table, we need to add either :

- SLAB_DESTROY_BY_RCU flag to their kmem_cache
- add RCU grace period before freeing them.

Since we carefully respected the SLAB_DESTROY_BY_RCU protocol
like ESTABLISH and TIMEWAIT sockets, use it here.

req_prot_init() being only used by TCP and DCCP, I did not add
a new slab_flags into their rsk_prot, but reuse prot->slab_flags

Since all reqsk_alloc() users are correctly dealing with a failure,
add the __GFP_NOWARN flag to avoid traces under pressure.

Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e96f78ab

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功