提交 · b89afb116ca2830cc982624f93e888860868a84b · gsplhtlxg / clone-Linux

24 3月, 2018 13 次提交

tipc: allow closest-first lookup algorithm when legacy address is configured · b89afb11

由 Jon Maloy 提交于 3月 22, 2018

The removal of an internal structure of the node address has an unwanted
side effect.
- Currently, if a user is sending an anycast message with destination
  domain 0, the tipc_namebl_translate() function will use the 'closest-
  first' algorithm to first look for a node local destination, and only
  when no such is found, will it resort to the cluster global 'round-
  robin' lookup algorithm.
- Current users can get around this, and enforce unconditional use of
  global round-robin by indicating a destination as Z.0.0 or Z.C.0.
- This option disappears when we make the node address flat, since the
  lookup algorithm has no way of recognizing this case. So, as long as
  there are node local destinations, the algorithm will always select
  one of those, and there is nothing the sender can do to change this.

We solve this by eliminating the 'closest-first' option, which was never
a good idea anyway, for non-legacy users, but only for those. To
distinguish between legacy users and non-legacy users we introduce a new
flag 'legacy_addr_format' in struct tipc_core, to be set when the user
configures a legacy-style Z.C.N node address. Hence, when a legacy user
indicates a zero lookup domain 'closest-first' is selected, and in all
other cases we use 'round-robin'.
Acked-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b89afb11

tipc: remove restrictions on node address values · 20263641

由 Jon Maloy 提交于 3月 22, 2018

Nominally, TIPC organizes network nodes into a three-level network
hierarchy consisting of the levels 'zone', 'cluster' and 'node'. This
hierarchy is reflected in the node address format, - it is sub-divided
into an 8-bit zone id, and 12 bit cluster id, and a 12-bit node id.

However, the 'zone' and 'cluster' levels have in reality never been
fully implemented,and never will be. The result of this has been
that the first 20 bits the node identity structure have been wasted,
and the usable node identity range within a cluster has been limited
to 12 bits. This is starting to become a problem.

In the following commits, we will need to be able to connect between
nodes which are using the whole 32-bit value space of the node address.
We therefore remove the restrictions on which values can be assigned
to node identity, -it is from now on only a 32-bit integer with no
assumed internal structure.

Isolation between clusters is now achieved only by setting different
values for the 'network id' field used during neighbor discovery, in
practice leading to the latter becoming the new cluster identity.

The rules for accepting discovery requests/responses from neighboring
nodes now become:

- If the user is using legacy address format on both peers, reception
  of discovery messages is subject to the legacy lookup domain check
  in addition to the cluster id check.

- Otherwise, the discovery request/response is always accepted, provided
  both peers have the same network id.

This secures backwards compatibility for users who have been using zone
or cluster identities as cluster separators, instead of the intended
'network id'.
Acked-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

20263641

tipc: some cleanups in the file discover.c · b39e465e

由 Jon Maloy 提交于 3月 22, 2018

To facilitate the coming changes in the neighbor discovery functionality
we make some renaming and refactoring of that code. The functional changes
in this commit are trivial, e.g., that we move the message sending call in
tipc_disc_timeout() outside the spinlock protected region.
Acked-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b39e465e

tipc: refactor function tipc_enable_bearer() · cb30a633

由 Jon Maloy 提交于 3月 22, 2018

As a preparation for the next commits we try to reduce the footprint of
the function tipc_enable_bearer(), while hopefully making is simpler to
follow.
Acked-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cb30a633

net: Convert rxrpc_net_ops · b2864fbd

由 Kirill Tkhai 提交于 3月 22, 2018

These pernet_operations modifies rxrpc_net_id-pointed
per-net entities. There is external link to AF_RXRPC
in fs/afs/Kconfig, but it seems there is no other
pernet_operations interested in that per-net entities.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2864fbd

net: Convert udp_sysctl_ops · fc18999e

由 Kirill Tkhai 提交于 3月 22, 2018

These pernet_operations just initialize udp4 defaults.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc18999e

net: bridge: fix direct access to bridge vlan_enabled and use helper · 82792a07

由 Nikolay Aleksandrov 提交于 3月 23, 2018

We need to use br_vlan_enabled() helper otherwise we'll break builds
without bridge vlans:
net/bridge//br_if.c: In function ‘br_mtu’:
net/bridge//br_if.c:458:8: error: ‘const struct net_bridge’ has no
member named ‘vlan_enabled’
  if (br->vlan_enabled)
        ^
net/bridge//br_if.c:462:1: warning: control reaches end of non-void
function [-Wreturn-type]
 }
 ^
scripts/Makefile.build:324: recipe for target 'net/bridge//br_if.o'
failed

Fixes: 419d14af ("bridge: Allow max MTU when multiple VLANs present")
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82792a07

tls: RX path for ktls · c46234eb

由 Dave Watson 提交于 3月 22, 2018

Add rx path for tls software implementation.

recvmsg, splice_read, and poll implemented.

An additional sockopt TLS_RX is added, with the same interface as
TLS_TX.  Either TLX_RX or TLX_TX may be provided separately, or
together (with two different setsockopt calls with appropriate keys).

Control messages are passed via CMSG in a similar way to transmit.
If no cmsg buffer is passed, then only application data records
will be passed to userspace, and EIO is returned for other types of
alerts.

EBADMSG is passed for decryption errors, and EMSGSIZE is passed for
framing too big, and EBADMSG for framing too small (matching openssl
semantics). EINVAL is returned for TLS versions that do not match the
original setsockopt call.  All are unrecoverable.

strparser is used to parse TLS framing.   Decryption is done directly
in to userspace buffers if they are large enough to support it, otherwise
sk_cow_data is called (similar to ipsec), and buffers are decrypted in
place and copied.  splice_read always decrypts in place, since no
buffers are provided to decrypt in to.

sk_poll is overridden, and only returns POLLIN if a full TLS message is
received.  Otherwise we wait for strparser to finish reading a full frame.
Actual decryption is only done during recvmsg or splice_read calls.
Signed-off-by: NDave Watson <davejwatson@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c46234eb

tls: Refactor variable names · 58371585

由 Dave Watson 提交于 3月 22, 2018

Several config variables are prefixed with tx, drop the prefix
since these will be used for both tx and rx.
Signed-off-by: NDave Watson <davejwatson@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58371585

tls: Pass error code explicitly to tls_err_abort · f4a8e43f

由 Dave Watson 提交于 3月 22, 2018

Pass EBADMSG explicitly to tls_err_abort.  Receive path will
pass additional codes - EMSGSIZE if framing is larger than max
TLS record size, EINVAL if TLS version mismatch.
Signed-off-by: NDave Watson <davejwatson@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4a8e43f

tls: Move cipher info to a separate struct · dbe42559

由 Dave Watson 提交于 3月 22, 2018

Separate tx crypto parameters to a separate cipher_context struct.
The same parameters will be used for rx using the same struct.

tls_advance_record_sn is modified to only take the cipher info.
Signed-off-by: NDave Watson <davejwatson@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dbe42559

tls: Generalize zerocopy_from_iter · 69ca9293

由 Dave Watson 提交于 3月 22, 2018

Refactor zerocopy_from_iter to take arguments for pages and size,
such that it can be used for both tx and rx. RX will also support
zerocopy direct to output iter, as long as the full message can
be copied at once (a large enough userspace buffer was provided).
Signed-off-by: NDave Watson <davejwatson@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69ca9293

bridge: Allow max MTU when multiple VLANs present · 419d14af

由 Chas Williams 提交于 3月 22, 2018

If the bridge is allowing multiple VLANs, some VLANs may have
different MTUs.  Instead of choosing the minimum MTU for the
bridge interface, choose the maximum MTU of the bridge members.
With this the user only needs to set a larger MTU on the member
ports that are participating in the large MTU VLANS.
Signed-off-by: NChas Williams <3chas3@gmail.com>
Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

419d14af

23 3月, 2018 13 次提交

net: Replace ip_ra_lock with per-net mutex · d9ff3049

由 Kirill Tkhai 提交于 3月 22, 2018

Since ra_chain is per-net, we may use per-net mutexes
to protect them in ip_ra_control(). This improves
scalability.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d9ff3049

net: Make ip_ra_chain per struct net · 5796ef75

由 Kirill Tkhai 提交于 3月 22, 2018

This is optimization, which makes ip_call_ra_chain()
iterate less sockets to find the sockets it's looking for.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5796ef75

net: Revert "ipv4: fix a deadlock in ip_ra_control" · 128aaa98

由 Kirill Tkhai 提交于 3月 22, 2018

This reverts commit 1215e51e.
Since raw_close() is used on every RAW socket destruction,
the changes made by 1215e51e scale sadly. This clearly
seen on endless unshare(CLONE_NEWNET) test, and cleanup_net()
kwork spends a lot of time waiting for rtnl_lock() introduced
by this commit.

Previous patch moved IP_ROUTER_ALERT out of rtnl_lock(),
so we revert this patch.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

128aaa98

net: Move IP_ROUTER_ALERT out of lock_sock(sk) · 0526947f

由 Kirill Tkhai 提交于 3月 22, 2018

ip_ra_control() does not need sk_lock. Who are the another
users of ip_ra_chain? ip_mroute_setsockopt() doesn't take
sk_lock, while parallel IP_ROUTER_ALERT syscalls are
synchronized by ip_ra_lock. So, we may move this command
out of sk_lock.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0526947f

net: Revert "ipv4: get rid of ip_ra_lock" · 76d3e153

由 Kirill Tkhai 提交于 3月 22, 2018

This reverts commit ba3f571d. The commit was made
after 1215e51e "ipv4: fix a deadlock in ip_ra_control",
and killed ip_ra_lock, which became useless after rtnl_lock()
made used to destroy every raw ipv4 socket. This scales
very bad, and next patch in series reverts 1215e51e.
ip_ra_lock will be used again.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76d3e153

gre: fix TUNNEL_SEQ bit check on sequence numbering · 15746394

由 Colin Ian King 提交于 3月 21, 2018

The current logic of flags | TUNNEL_SEQ is always non-zero and hence
sequence numbers are always incremented no matter the setting of the
TUNNEL_SEQ bit.  Fix this by using & instead of |.

Detected by CoverityScan, CID#1466039 ("Operands don't affect result")

Fixes: 77a5196a ("gre: add sequence number for collect md mode.")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Acked-by: NWilliam Tu <u9012063@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

15746394

tipc: step sk->sk_drops when rcv buffer is full · 872619d8

由 GhantaKrishnamurthy MohanKrishna 提交于 3月 21, 2018

Currently when tipc is unable to queue a received message on a
socket, the message is rejected back to the sender with error
TIPC_ERR_OVERLOAD. However, the application on this socket
has no knowledge about these discards.

In this commit, we try to step the sk_drops counter when tipc
is unable to queue a received message. Export sk_drops
using tipc socket diagnostics.
Acked-by: NJon Maloy <jon.maloy@ericsson.com>
Acked-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NGhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

872619d8

tipc: implement socket diagnostics for AF_TIPC · c30b70de

由 GhantaKrishnamurthy MohanKrishna 提交于 3月 21, 2018

This commit adds socket diagnostics capability for AF_TIPC in netlink
family NETLINK_SOCK_DIAG in a new kernel module (diag.ko).

The following are key design considerations:
- config TIPC_DIAG has default y, like INET_DIAG.
- only requests with flag NLM_F_DUMP is supported (dump all).
- tipc_sock_diag_req message is introduced to send filter parameters.
- the response attributes are of TLV, some nested.

To avoid exposing data structures between diag and tipc modules and
avoid code duplication, the following additions are required:
- export tipc_nl_sk_walk function to reuse socket iterator.
- export tipc_sk_fill_sock_diag to fill the tipc diag attributes.
- create a sock_diag response message in __tipc_add_sock_diag defined
  in diag.c and use the above exported tipc_sk_fill_sock_diag
  to fill response.
Acked-by: NJon Maloy <jon.maloy@ericsson.com>
Acked-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NGhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c30b70de

tipc: modify socket iterator for sock_diag · dfde331e

由 GhantaKrishnamurthy MohanKrishna 提交于 3月 21, 2018

The current socket iterator function tipc_nl_sk_dump, handles socket
locks and calls __tipc_nl_add_sk for each socket.
To reuse this logic in sock_diag implementation, we do minor
modifications to make these functions generic as described below.

In this commit, we add a two new functions __tipc_nl_sk_walk,
__tipc_nl_add_sk_info and modify tipc_nl_sk_dump, __tipc_nl_add_sk
accordingly.

In __tipc_nl_sk_walk we:
1. acquire and release socket locks
2. for each socket, execute the specified callback function

In __tipc_nl_add_sk we:
- Move the netlink attribute insertion to __tipc_nl_add_sk_info.

tipc_nl_sk_dump calls tipc_nl_sk_walk with __tipc_nl_add_sk as argument.

sock_diag will use these generic functions in a later commit.

There is no functional change in this commit.
Acked-by: NJon Maloy <jon.maloy@ericsson.com>
Acked-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NGhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dfde331e

devlink: Remove top_hierarchy arg to devlink_resource_register · 14530746

由 David Ahern 提交于 3月 20, 2018

top_hierarchy arg can be determined by comparing parent_resource_id to
DEVLINK_RESOURCE_ID_PARENT_TOP so it does not need to be a separate
argument.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

14530746

net/ipv6: Handle onlink flag with multipath routes · 68e2ffde

由 David Ahern 提交于 3月 20, 2018

For multipath routes the ONLINK flag can be specified per nexthop in
rtnh_flags or globally in rtm_flags. Update ip6_route_multipath_add
to consider the ONLINK setting coming from rtnh_flags. Each loop over
nexthops the config for the sibling route is initialized to the global
config and then per nexthop settings overlayed. The flag is 'or'ed into
fib6_config to handle the ONLINK flag coming from either rtm_flags or
rtnh_flags.

Fixes: fc1e64e1 ("net/ipv6: Add support for onlink flag")
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68e2ffde

ipv6: sr: fix NULL pointer dereference when setting encap source address · 8936ef76

由 David Lebrun 提交于 3月 20, 2018

When using seg6 in encap mode, we call ipv6_dev_get_saddr() to set the
source address of the outer IPv6 header, in case none was specified.
Using skb->dev can lead to BUG() when it is in an inconsistent state.
This patch uses the net_device attached to the skb's dst instead.

[940807.667429] BUG: unable to handle kernel NULL pointer dereference at 000000000000047c
[940807.762427] IP: ipv6_dev_get_saddr+0x8b/0x1d0
[940807.815725] PGD 0 P4D 0
[940807.847173] Oops: 0000 [#1] SMP PTI
[940807.890073] Modules linked in:
[940807.927765] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G        W        4.16.0-rc1-seg6bpf+ #2
[940808.028988] Hardware name: HP ProLiant DL120 G6/ProLiant DL120 G6, BIOS O26    09/06/2010
[940808.128128] RIP: 0010:ipv6_dev_get_saddr+0x8b/0x1d0
[940808.187667] RSP: 0018:ffff88043fd836b0 EFLAGS: 00010206
[940808.251366] RAX: 0000000000000005 RBX: ffff88042cb1c860 RCX: 00000000000000fe
[940808.338025] RDX: 00000000000002c0 RSI: ffff88042cb1c860 RDI: 0000000000004500
[940808.424683] RBP: ffff88043fd83740 R08: 0000000000000000 R09: ffffffffffffffff
[940808.511342] R10: 0000000000000040 R11: 0000000000000000 R12: ffff88042cb1c850
[940808.598012] R13: ffffffff8208e380 R14: ffff88042ac8da00 R15: 0000000000000002
[940808.684675] FS:  0000000000000000(0000) GS:ffff88043fd80000(0000) knlGS:0000000000000000
[940808.783036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[940808.852975] CR2: 000000000000047c CR3: 00000004255fe000 CR4: 00000000000006e0
[940808.939634] Call Trace:
[940808.970041]  <IRQ>
[940808.995250]  ? ip6t_do_table+0x265/0x640
[940809.043341]  seg6_do_srh_encap+0x28f/0x300
[940809.093516]  ? seg6_do_srh+0x1a0/0x210
[940809.139528]  seg6_do_srh+0x1a0/0x210
[940809.183462]  seg6_output+0x28/0x1e0
[940809.226358]  lwtunnel_output+0x3f/0x70
[940809.272370]  ip6_xmit+0x2b8/0x530
[940809.313185]  ? ac6_proc_exit+0x20/0x20
[940809.359197]  inet6_csk_xmit+0x7d/0xc0
[940809.404173]  tcp_transmit_skb+0x548/0x9a0
[940809.453304]  __tcp_retransmit_skb+0x1a8/0x7a0
[940809.506603]  ? ip6_default_advmss+0x40/0x40
[940809.557824]  ? tcp_current_mss+0x24/0x90
[940809.605925]  tcp_retransmit_skb+0xd/0x80
[940809.654016]  tcp_xmit_retransmit_queue.part.17+0xf9/0x210
[940809.719797]  tcp_ack+0xa47/0x1110
[940809.760612]  tcp_rcv_established+0x13c/0x570
[940809.812865]  tcp_v6_do_rcv+0x151/0x3d0
[940809.858879]  tcp_v6_rcv+0xa5c/0xb10
[940809.901770]  ? seg6_output+0xdd/0x1e0
[940809.946745]  ip6_input_finish+0xbb/0x460
[940809.994837]  ip6_input+0x74/0x80
[940810.034612]  ? ip6_rcv_finish+0xb0/0xb0
[940810.081663]  ipv6_rcv+0x31c/0x4c0
...

Fixes: 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Reported-by: NTom Herbert <tom@quantonium.net>
Signed-off-by: NDavid Lebrun <dlebrun@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8936ef76

ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state · 191f86ca

由 David Lebrun 提交于 3月 20, 2018

The seg6_build_state() function is called with RCU read lock held,
so we cannot use GFP_KERNEL. This patch uses GFP_ATOMIC instead.

[   92.770271] =============================
[   92.770628] WARNING: suspicious RCU usage
[   92.770921] 4.16.0-rc4+ #12 Not tainted
[   92.771277] -----------------------------
[   92.771585] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section!
[   92.772279]
[   92.772279] other info that might help us debug this:
[   92.772279]
[   92.773067]
[   92.773067] rcu_scheduler_active = 2, debug_locks = 1
[   92.773514] 2 locks held by ip/2413:
[   92.773765]  #0:  (rtnl_mutex){+.+.}, at: [<00000000e5461720>] rtnetlink_rcv_msg+0x441/0x4d0
[   92.774377]  #1:  (rcu_read_lock){....}, at: [<00000000df4f161e>] lwtunnel_build_state+0x59/0x210
[   92.775065]
[   92.775065] stack backtrace:
[   92.775371] CPU: 0 PID: 2413 Comm: ip Not tainted 4.16.0-rc4+ #12
[   92.775791] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014
[   92.776608] Call Trace:
[   92.776852]  dump_stack+0x7d/0xbc
[   92.777130]  __schedule+0x133/0xf00
[   92.777393]  ? unwind_get_return_address_ptr+0x50/0x50
[   92.777783]  ? __sched_text_start+0x8/0x8
[   92.778073]  ? rcu_is_watching+0x19/0x30
[   92.778383]  ? kernel_text_address+0x49/0x60
[   92.778800]  ? __kernel_text_address+0x9/0x30
[   92.779241]  ? unwind_get_return_address+0x29/0x40
[   92.779727]  ? pcpu_alloc+0x102/0x8f0
[   92.780101]  _cond_resched+0x23/0x50
[   92.780459]  __mutex_lock+0xbd/0xad0
[   92.780818]  ? pcpu_alloc+0x102/0x8f0
[   92.781194]  ? seg6_build_state+0x11d/0x240
[   92.781611]  ? save_stack+0x9b/0xb0
[   92.781965]  ? __ww_mutex_wakeup_for_backoff+0xf0/0xf0
[   92.782480]  ? seg6_build_state+0x11d/0x240
[   92.782925]  ? lwtunnel_build_state+0x1bd/0x210
[   92.783393]  ? ip6_route_info_create+0x687/0x1640
[   92.783846]  ? ip6_route_add+0x74/0x110
[   92.784236]  ? inet6_rtm_newroute+0x8a/0xd0

Fixes: 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Signed-off-by: NDavid Lebrun <dlebrun@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

191f86ca

22 3月, 2018 11 次提交

rds: tcp: remove register_netdevice_notifier infrastructure. · bdf5bd7f

由 Sowmini Varadhan 提交于 3月 19, 2018

The netns deletion path does not need to wait for all net_devices
to be unregistered before dismantling rds_tcp state for the netns
(we are able to dismantle this state on module unload even when
all net_devices are active so there is no dependency here).

This patch removes code related to netdevice notifiers and
refactors all the code needed to dismantle rds_tcp state
into a ->exit callback for the pernet_operations used with
register_pernet_device().
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bdf5bd7f

net: Convert nf_ct_net_ops · aa65f636

由 Kirill Tkhai 提交于 3月 19, 2018

These pernet_operations register and unregister sysctl.
Also, there is inet_frags_exit_net() called in exit method,
which has to be safe after a5600024 "net: Fix hlist
corruptions in inet_evict_bucket()".
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa65f636

net: Convert lowpan_frags_ops · 08012631

由 Kirill Tkhai 提交于 3月 19, 2018

These pernet_operations register and unregister sysctl.
Also, there is inet_frags_exit_net() called in exit method,
which has to be safe after a5600024 "net: Fix hlist
corruptions in inet_evict_bucket()".
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

08012631

net: Convert can_pernet_ops · 1ae77627

由 Kirill Tkhai 提交于 3月 19, 2018

These pernet_operations create and destroy /proc entries
and cancel per-net timer.

Also, there are unneed iterations over empty list of net
devices, since all net devices must be already moved
to init_net or unregistered by default_device_ops. This
already was mentioned here:

https://marc.info/?l=linux-can&m=150169589119335&w=2

So, it looks safe to make them async.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ae77627

net/sched: fix idr leak in the error path of tcf_skbmod_init() · f29cdfbe

由 Davide Caratti 提交于 3月 19, 2018

tcf_skbmod_init() can fail after the idr has been successfully reserved.
When this happens, every subsequent attempt to configure skbmod rules
using the same idr value will systematically fail with -ENOSPC, unless
the first attempt was done using the 'replace' keyword:

 # tc action add action skbmod swap mac index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc action add action skbmod swap mac index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 # tc action add action skbmod swap mac index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 ...

Fix this in tcf_skbmod_init(), ensuring that tcf_idr_release() is called
on the error path when the idr has been reserved, but not yet inserted.
Also, don't test 'ovr' in the error path, to avoid a 'replace' failure
implicitly become a 'delete' that leaks refcount in act_skbmod module:

 # rmmod act_skbmod; modprobe act_skbmod
 # tc action add action skbmod swap mac index 100
 # tc action add action skbmod swap mac continue index 100
 RTNETLINK answers: File exists
 We have an error talking to the kernel
 # tc action replace action skbmod swap mac continue index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc action list action skbmod
 #
 # rmmod  act_skbmod
 rmmod: ERROR: Module act_skbmod is in use

Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f29cdfbe

net/sched: fix idr leak in the error path of tcf_vlan_init() · d7f20015

由 Davide Caratti 提交于 3月 19, 2018

tcf_vlan_init() can fail after the idr has been successfully reserved.
When this happens, every subsequent attempt to configure vlan rules using
the same idr value will systematically fail with -ENOSPC, unless the first
attempt was done using the 'replace' keyword.

 # tc action add action vlan pop index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc action add action vlan pop index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 # tc action add action vlan pop index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 ...

Fix this in tcf_vlan_init(), ensuring that tcf_idr_release() is called on
the error path when the idr has been reserved, but not yet inserted. Also,
don't test 'ovr' in the error path, to avoid a 'replace' failure implicitly
become a 'delete' that leaks refcount in act_vlan module:

 # rmmod act_vlan; modprobe act_vlan
 # tc action add action vlan push id 5 index 100
 # tc action replace action vlan push id 7 index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc action list action vlan
 #
 # rmmod act_vlan
 rmmod: ERROR: Module act_vlan is in use

Fixes: 4c5b9d96 ("act_vlan: VLAN action rewrite to use RCU lock/unlock and update")
Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d7f20015

net/sched: fix idr leak in the error path of __tcf_ipt_init() · 1e46ef17

由 Davide Caratti 提交于 3月 19, 2018

__tcf_ipt_init() can fail after the idr has been successfully reserved.
When this happens, subsequent attempts to configure xt/ipt rules using
the same idr value systematically fail with -ENOSPC:

 # tc action add action xt -j LOG --log-prefix test1 index 100
 tablename: mangle hook: NF_IP_POST_ROUTING
         target:  LOG level warning prefix "test1" index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 Command "(null)" is unknown, try "tc actions help".
 # tc action add action xt -j LOG --log-prefix test1 index 100
 tablename: mangle hook: NF_IP_POST_ROUTING
         target:  LOG level warning prefix "test1" index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 Command "(null)" is unknown, try "tc actions help".
 # tc action add action xt -j LOG --log-prefix test1 index 100
 tablename: mangle hook: NF_IP_POST_ROUTING
         target:  LOG level warning prefix "test1" index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 ...

Fix this in the error path of __tcf_ipt_init(), calling tcf_idr_release()
in place of tcf_idr_cleanup(). Since tcf_ipt_release() can now be called
when tcfi_t is NULL, we also need to protect calls to ipt_destroy_target()
to avoid NULL pointer dereference.

Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e46ef17

net/sched: fix idr leak in the error path of tcp_pedit_init() · 94fa3f92

由 Davide Caratti 提交于 3月 19, 2018

tcf_pedit_init() can fail to allocate 'keys' after the idr has been
successfully reserved. When this happens, subsequent attempts to configure
a pedit rule using the same idr value systematically fail with -ENOSPC:

 # tc action add action pedit munge ip ttl set 63 index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc action add action pedit munge ip ttl set 63 index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 # tc action add action pedit munge ip ttl set 63 index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 ...

Fix this in the error path of tcf_act_pedit_init(), calling
tcf_idr_release() in place of tcf_idr_cleanup().

Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94fa3f92

net/sched: fix idr leak in the error path of tcf_act_police_init() · 5bf7f818

由 Davide Caratti 提交于 3月 19, 2018

tcf_act_police_init() can fail after the idr has been successfully
reserved (e.g., qdisc_get_rtab() may return NULL). When this happens,
subsequent attempts to configure a police rule using the same idr value
systematiclly fail with -ENOSPC:

 # tc action add action police rate 1000 burst 1000 drop index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc action add action police rate 1000 burst 1000 drop index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 # tc action add action police rate 1000 burst 1000 drop index 100
 RTNETLINK answers: No space left on device
 ...

Fix this in the error path of tcf_act_police_init(), calling
tcf_idr_release() in place of tcf_idr_cleanup().

Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5bf7f818

net/sched: fix idr leak in the error path of tcf_simp_init() · 60e10b3a

由 Davide Caratti 提交于 3月 19, 2018

if the kernel fails to duplicate 'sdata', creation of a new action fails
with -ENOMEM. However, subsequent attempts to install the same action
using the same value of 'index' systematically fail with -ENOSPC, and
that value of 'index' will no more be usable by act_simple, until rmmod /
insmod of act_simple.ko is done:

 # tc actions add action simple sdata hello index 100
 # tc actions list action simple

        action order 0: Simple <hello>
         index 100 ref 1 bind 0
 # tc actions flush action simple
 # tc actions add action simple sdata hello index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc actions flush action simple
 # tc actions add action simple sdata hello index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 # tc actions add action simple sdata hello index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 ...

Fix this in the error path of tcf_simp_init(), calling tcf_idr_release()
in place of tcf_idr_cleanup().

Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
Suggested-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60e10b3a

net/sched: fix idr leak on the error path of tcf_bpf_init() · bbc09e78

由 Davide Caratti 提交于 3月 19, 2018

when the following command sequence is entered

 # tc action add action bpf bytecode '4,40 0 0 12,31 0 1 2048,6 0 0 262144,6 0 0 0' index 100
 RTNETLINK answers: Invalid argument
 We have an error talking to the kernel
 # tc action add action bpf bytecode '4,40 0 0 12,21 0 1 2048,6 0 0 262144,6 0 0 0' index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel

act_bpf correctly refuses to install the first TC rule, because 31 is not
a valid instruction. However, it refuses to install the second TC rule,
even if the BPF code is correct. Furthermore, it's no more possible to
install any other rule having the same value of 'index' until act_bpf
module is unloaded/inserted again. After the idr has been reserved, call
tcf_idr_release() instead of tcf_idr_cleanup(), to fix this issue.

Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bbc09e78

21 3月, 2018 2 次提交

mac80211: add ieee80211_hw flag for QoS NDP support · 7c181f4f

由 Ben Caradoc-Davies 提交于 3月 19, 2018

Commit 7b6ddeaf ("mac80211: use QoS NDP for AP probing") added an
argument qos_ok to ieee80211_nullfunc_get to support QoS NDP. Despite
the claim in the commit log "Change all the drivers to *not* allow
QoS NDP for now, even though it looks like most of them should be OK
with that", this commit enables QoS NDP in response to beacons (see
change to mlme.c:ieee80211_send_nullfunc), causing ath9k_htc to lose
IP connectivity. See:
https://patchwork.kernel.org/patch/10241109/
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=891060

Introduce a hardware flag to allow such buggy drivers to override the
correct default behaviour of mac80211 of sending QoS NDP packets.
Signed-off-by: NBen Caradoc-Davies <ben@transient.nz>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

7c181f4f

ipv6: old_dport should be a __be16 in __ip6_datagram_connect() · 5f2fb802

由 Stefano Brivio 提交于 3月 19, 2018

Fixes: 2f987a76 ("net: ipv6: keep sk status consistent after datagram connect failure")
Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NGuillaume Nault <g.nault@alphalink.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f2fb802

20 3月, 2018 1 次提交

devlink: Remove redundant free on error path · 7fe4d6dc

由 Arkadi Sharshevsky 提交于 3月 18, 2018

The current code performs unneeded free. Remove the redundant skb freeing
during the error path.

Fixes: 1555d204 ("devlink: Support for pipeline debug (dpipe)")
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7fe4d6dc