提交 · 4f0087812648b7611157ae22954acfaed820d24e · openeuler / raspberrypi-kernel

06 1月, 2016 2 次提交

sctp: apply rhashtable api to send/recv path · 4f008781

由 Xin Long 提交于 12月 30, 2015

apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc
and __sctp_lookup_association, it's invoked in the protection of sock
lock, it will be safe, but sctp_lookup_association need to call
rcu_read_lock() and to detect the t->dead to protect it.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f008781

sctp: add the rhashtable apis for sctp global transport hashtable · d6c0256a

由 Xin Long 提交于 12月 30, 2015

tranport hashtbale will replace the association hashtable to do the
lookup for transport, and then get association by t->assoc, rhashtable
apis will be used because of it's resizable, scalable and using rcu.

lport + rport + paddr will be the base hashkey to locate the chain,
with net to protect one netns from another, then plus the laddr to
compare to get the target.

this patch will provider the lookup functions:
- sctp_epaddr_lookup_transport
- sctp_addrs_lookup_transport

hash/unhash functions:
- sctp_hash_transport
- sctp_unhash_transport

init/destroy functions:
- sctp_transport_hashtable_init
- sctp_transport_hashtable_destroy
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6c0256a

05 1月, 2016 5 次提交

soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF · 538950a1

由 Craig Gallek 提交于 1月 04, 2016

Expose socket options for setting a classic or extended BPF program
for use when selecting sockets in an SO_REUSEPORT group.  These options
can be used on the first socket to belong to a group before bind or
on any socket in the group after bind.

This change includes refactoring of the existing sk_filter code to
allow reuse of the existing BPF filter validation checks.
Signed-off-by: NCraig Gallek <kraig@google.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

538950a1

soreuseport: fast reuseport UDP socket selection · e32ea7e7

由 Craig Gallek 提交于 1月 04, 2016

Include a struct sock_reuseport instance when a UDP socket binds to
a specific address for the first time with the reuseport flag set.
When selecting a socket for an incoming UDP packet, use the information
available in sock_reuseport if present.

This required adding an additional field to the UDP source address
equality function to differentiate between exact and wildcard matches.
The original use case allowed wildcard matches when checking for
existing port uses during bind.  The new use case of adding a socket
to a reuseport group requires exact address matching.

Performance test (using a machine with 2 CPU sockets and a total of
48 cores):  Create reuseport groups of varying size.  Use one socket
from this group per user thread (pinning each thread to a different
core) calling recvmmsg in a tight loop.  Record number of messages
received per second while saturating a 10G link.
  10 sockets: 18% increase (~2.8M -> 3.3M pkts/s)
  20 sockets: 14% increase (~2.9M -> 3.3M pkts/s)
  40 sockets: 13% increase (~3.0M -> 3.4M pkts/s)

This work is based off a similar implementation written by
Ying Cai <ycai@google.com> for implementing policy-based reuseport
selection.
Signed-off-by: NCraig Gallek <kraig@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e32ea7e7

soreuseport: define reuseport groups · ef456144

由 Craig Gallek 提交于 1月 04, 2016

struct sock_reuseport is an optional shared structure referenced by each
socket belonging to a reuseport group.  When a socket is bound to an
address/port not yet in use and the reuseport flag has been set, the
structure will be allocated and attached to the newly bound socket.
When subsequent calls to bind are made for the same address/port, the
shared structure will be updated to include the new socket and the
newly bound socket will reference the group structure.

Usually, when an incoming packet was destined for a reuseport group,
all sockets in the same group needed to be considered before a
dispatching decision was made.  With this structure, an appropriate
socket can be found after looking up just one socket in the group.

This shared structure will also allow for more complicated decisions to
be made when selecting a socket (eg a BPF filter).

This work is based off a similar implementation written by
Ying Cai <ycai@google.com> for implementing policy-based reuseport
selection.
Signed-off-by: NCraig Gallek <kraig@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef456144

udp: properly support MSG_PEEK with truncated buffers · 197c949e

由 Eric Dumazet 提交于 12月 30, 2015

Backport of this upstream commit into stable kernels :
89c22d8c ("net: Fix skb csum races when peeking")
exposed a bug in udp stack vs MSG_PEEK support, when user provides
a buffer smaller than skb payload.

In this case,
skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr),
                                 msg->msg_iov);
returns -EFAULT.

This bug does not happen in upstream kernels since Al Viro did a great
job to replace this into :
skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg);
This variant is safe vs short buffers.

For the time being, instead reverting Herbert Xu patch and add back
skb->ip_summed invalid changes, simply store the result of
udp_lib_checksum_complete() so that we avoid computing the checksum a
second time, and avoid the problematic
skb_copy_and_csum_datagram_iovec() call.

This patch can be applied on recent kernels as it avoids a double
checksumming, then backported to stable kernels as a bug fix.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

197c949e

l2tp: rely on ppp layer for skb scrubbing · 98f40b3e

由 Guillaume Nault 提交于 12月 29, 2015

Since 79c441ae ("ppp: implement x-netns support"), the PPP layer
calls skb_scrub_packet() whenever the skb is received on the PPP
device. Manually resetting packet meta-data in the L2TP layer is thus
redundant.
Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

98f40b3e

31 12月, 2015 3 次提交

ethtool: Add phy statistics · f3a40945

由 Andrew Lunn 提交于 12月 30, 2015

Ethernet PHYs can maintain statistics, for example errors while idle
and receive errors. Add an ethtool mechanism to retrieve these
statistics, using the same model as MAC statistics.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3a40945

sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_close · 068d8bd3

由 Xin Long 提交于 12月 29, 2015

In sctp_close, sctp_make_abort_user may return NULL because of memory
allocation failure. If this happens, it will bypass any state change
and never free the assoc. The assoc has no chance to be freed and it
will be kept in memory with the state it had even after the socket is
closed by sctp_close().

So if sctp_make_abort_user fails to allocate memory, we should abort
the asoc via sctp_primitive_ABORT as well. Just like the annotation in
sctp_sf_cookie_wait_prm_abort and sctp_sf_do_9_1_prm_abort said,
"Even if we can't send the ABORT due to low memory delete the TCB.
This is a departure from our typical NOMEM handling".

But then the chunk is NULL (low memory) and the SCTP_CMD_REPLY cmd would
dereference the chunk pointer, and system crash. So we should add
SCTP_CMD_REPLY cmd only when the chunk is not NULL, just like other
places where it adds SCTP_CMD_REPLY cmd.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

068d8bd3

net, socket, socket_wq: fix missing initialization of flags · 574aab1e

由 Nicolai Stange 提交于 12月 29, 2015

Commit ceb5d58b ("net: fix sock_wake_async() rcu protection") from
the current 4.4 release cycle introduced a new flags member in
struct socket_wq and moved SOCKWQ_ASYNC_NOSPACE and SOCKWQ_ASYNC_WAITDATA
from struct socket's flags member into that new place.

Unfortunately, the new flags field is never initialized properly, at least
not for the struct socket_wq instance created in sock_alloc_inode().

One particular issue I encountered because of this is that my GNU Emacs
failed to draw anything on my desktop -- i.e. what I got is a transparent
window, including the title bar. Bisection lead to the commit mentioned
above and further investigation by means of strace told me that Emacs
is indeed speaking to my Xorg through an O_ASYNC AF_UNIX socket. This is
reproducible 100% of times and the fact that properly initializing the
struct socket_wq ->flags fixes the issue leads me to the conclusion that
somehow SOCKWQ_ASYNC_WAITDATA got set in the uninitialized ->flags,
preventing my Emacs from receiving any SIGIO's due to data becoming
available and it got stuck.

Make sock_alloc_inode() set the newly created struct socket_wq's ->flags
member to zero.

Fixes: ceb5d58b ("net: fix sock_wake_async() rcu protection")
Signed-off-by: NNicolai Stange <nicstange@gmail.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

574aab1e

30 12月, 2015 5 次提交

openvswitch: Fix template leak in error cases. · 90c7afc9

由 Joe Stringer 提交于 12月 23, 2015

Commit 5b48bb8506c5 ("openvswitch: Fix helper reference leak") fixed a
reference leak on helper objects, but inadvertently introduced a leak on
the ct template.

Previously, ct_info.ct->general.use was initialized to 0 by
nf_ct_tmpl_alloc() and only incremented when ovs_ct_copy_action()
returned successful. If an error occurred while adding the helper or
adding the action to the actions buffer, the __ovs_ct_free_action()
cleanup would use nf_ct_put() to free the entry; However, this relies on
atomic_dec_and_test(ct_info.ct->general.use). This reference must be
incremented first, or nf_ct_put() will never free it.

Fix the issue by acquiring a reference to the template immediately after
allocation.

Fixes: cae3a262 ("openvswitch: Allow attaching helpers to ct action")
Fixes: 5b48bb8506c5 ("openvswitch: Fix helper reference leak")
Signed-off-by: NJoe Stringer <joe@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

90c7afc9

NFC: nci: memory leak in nci_core_conn_create() · c6dc65d8

由 Dan Carpenter 提交于 12月 23, 2015

I've moved the check for "number_destination_params" forward
a few lines to avoid leaking "cmd".

Fixes: caa575a8 ('NFC: nci: fix possible crash in nci_core_conn_create')
Acked-by: NChristophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>

c6dc65d8

nfc: netlink: HCI event connectivity implementation · 9afec6d3

由 Christophe Ricard 提交于 12月 23, 2015

Add support for missing HCI event EVT_CONNECTIVITY and forward
it to userspace.
Signed-off-by: NChristophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>

9afec6d3

NFC: nci: Fix error check of nci_hci_create_pipe() result · 2a84193f

由 Christophe Ricard 提交于 12月 23, 2015

net/nfc/nci/hci.c: In function nci_hci_connect_gate :
net/nfc/nci/hci.c:679: warning: comparison is always false due to limited range of data type

In case of error, nci_hci_create_pipe() returns NCI_HCI_INVALID_PIPE,
and not a negative error code.

Correct the check to fix this.
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NChristophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>

2a84193f

NFC: digital: Add Type4A tags support · ce2e56cd

由 Shikha Singh 提交于 11月 20, 2015

The definition of DIGITAL_PROTO_NFCA_RF_TECH is modified to support
ISO14443 Type4A tags. Without this change it is not possible to start
polling for ISO14443 Type4A tags from the initiator side.
Signed-off-by: NShikha Singh <shikha.singh@st.com>
Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>

ce2e56cd

28 12月, 2015 2 次提交

sctp: label accepted/peeled off sockets · 3538a5c8

由 Marcelo Ricardo Leitner 提交于 12月 23, 2015

Accepted or peeled off sockets were missing a security label (e.g.
SELinux) which means that socket was in "unlabeled" state.

This patch clones the sock's label from the parent sock and resolves the
issue (similar to AF_BLUETOOTH protocol family).

Cc: Paul Moore <pmoore@redhat.com>
Cc: David Teigland <teigland@redhat.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: NPaul Moore <paul@paul-moore.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3538a5c8

sctp: use GFP_USER for user-controlled kmalloc · 9ba0b963

由 Marcelo Ricardo Leitner 提交于 12月 23, 2015

Commit cacc0621 ("sctp: use GFP_USER for user-controlled kmalloc")
missed two other spots.

For connectx, as it's more likely to be used by kernel users of the API,
it detects if GFP_USER should be used or not.

Fixes: cacc0621 ("sctp: use GFP_USER for user-controlled kmalloc")
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ba0b963

26 12月, 2015 1 次提交

ip_tunnel: Move stats update to iptunnel_xmit() · 039f5062

由 Pravin B Shelar 提交于 12月 24, 2015

By moving stats update into iptunnel_xmit(), we can simplify
iptunnel_xmit() usage. With this change there is no need to
call another function (iptunnel_xmit_stats()) to update stats
in tunnel xmit code path.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

039f5062

24 12月, 2015 2 次提交

bridge: use kobj_to_dev instead of to_dev · aeb7ed14

由 Geliang Tang 提交于 12月 23, 2015

kobj_to_dev has been defined in linux/device.h, so I replace to_dev
with it.
Signed-off-by: NGeliang Tang <geliangtang@163.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aeb7ed14

ipv6: honor ifindex in case we receive ll addresses in router advertisements · c1a9a291

由 Hannes Frederic Sowa 提交于 12月 23, 2015

Marc Haber reported we don't honor interface indexes when we receive link
local router addresses in router advertisements. Luckily the non-strict
version of ipv6_chk_addr already does the correct job here, so we can
simply use it to lighten the checks and use those addresses by default
without any configuration change.

Link: <http://permalink.gmane.org/gmane.linux.network/391348>
Reported-by: NMarc Haber <mh+netdev@zugschlus.de>
Cc: Marc Haber <mh+netdev@zugschlus.de>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c1a9a291

23 12月, 2015 8 次提交

tcp: honour SO_BINDTODEVICE for TW_RST case too · 271c3b9b

由 Florian Westphal 提交于 12月 21, 2015

Hannes points out that when we generate tcp reset for timewait sockets we
pretend we found no socket and pass NULL sk to tcp_vX_send_reset().

Make it cope with inet tw sockets and then provide tw sk.

This makes RSTs appear on correct interface when SO_BINDTODEVICE is used.

Packetdrill test case:
// want default route to be used, we rely on BINDTODEVICE
`ip route del 192.0.2.0/24 via 192.168.0.2 dev tun0`

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
// test case still works due to BINDTODEVICE
0.001 setsockopt(3, SOL_SOCKET, SO_BINDTODEVICE, "tun0", 4) = 0
0.100...0.200 connect(3, ..., ...) = 0

0.100 > S 0:0(0) <mss 1460,sackOK,nop,nop>
0.200 < S. 0:0(0) ack 1 win 32792 <mss 1460,sackOK,nop,nop>
0.200 > . 1:1(0) ack 1

0.210 close(3) = 0

0.210 > F. 1:1(0) ack 1 win 29200
0.300 < . 1:1(0) ack 2 win 46

// more data while in FIN_WAIT2, expect RST
1.300 < P. 1:1001(1000) ack 1 win 46

// fails without this change -- default route is used
1.301 > R 1:1(0) win 0
Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

271c3b9b

tcp: send_reset: test for non-NULL sk first · e46787f0

由 Florian Westphal 提交于 12月 21, 2015

tcp_md5_do_lookup requires a full socket, so once we extend
_send_reset() to also accept timewait socket we would have to change

if (!sk && hash_location)

to something like

if ((!sk || !sk_fullsock(sk)) && hash_location) {
  ...
} else {
  (sk && sk_fullsock(sk)) tcp_md5_do_lookup()
}

Switch the two branches: check if we have a socket first, then
fall back to a listener lookup if we saw a md5 option (hash_location).
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e46787f0

addrconf: always initialize sysctl table data · 5449a5ca

由 WANG Cong 提交于 12月 21, 2015

When sysctl performs restrict writes, it allows to write from
a middle position of a sysctl file, which requires us to initialize
the table data before calling proc_dostring() for the write case.

Fixes: 3d1bec99 ("ipv6: introduce secret_stable to ipv6_devconf")
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Tested-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5449a5ca

net: tcp: deal with listen sockets properly in tcp_abort. · 2010b93e

由 Lorenzo Colitti 提交于 12月 22, 2015

When closing a listen socket, tcp_abort currently calls
tcp_done without clearing the request queue. If the socket has a
child socket that is established but not yet accepted, the child
socket is then left without a parent, causing a leak.

Fix this by setting the socket state to TCP_CLOSE and calling
inet_csk_listen_stop with the socket lock held, like tcp_close
does.

Tested using net_test. With this patch, calling SOCK_DESTROY on a
listen socket that has an established but not yet accepted child
socket results in the parent and the child being closed, such
that they no longer appear in sock_diag dumps.
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2010b93e

ipv6/addrlabel: fix ip6addrlbl_get() · e459dfee

由 Andrey Ryabinin 提交于 12月 21, 2015

ip6addrlbl_get() has never worked. If ip6addrlbl_hold() succeeded,
ip6addrlbl_get() will exit with '-ESRCH'. If ip6addrlbl_hold() failed,
ip6addrlbl_get() will use about to be free ip6addrlbl_entry pointer.

Fix this by inverting ip6addrlbl_hold() check.

Fixes: 2a8cc6c8 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.")
Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: NCong Wang <cwang@twopensource.com>
Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e459dfee

switchdev: bridge: Pass ageing time as clock_t instead of jiffies · ef9cdd0f

由 Ido Schimmel 提交于 12月 21, 2015

The bridge's ageing time is offloaded to hardware when:
	1) A port joins a bridge
	2) The ageing time of the bridge is changed

In the first case the ageing time is offloaded as jiffies, but in the
second case it's offloaded as clock_t, which is what existing switchdev
drivers expect to receive.

Fixes: 6ac311ae ("Adding switchdev ageing notification on port bridged")
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef9cdd0f

RDS: don't pretend to use cpu notifiers · f2830d09

由 Sebastian Andrzej Siewior 提交于 12月 19, 2015

It looks like an attempt to use CPU notifier here which was never
completed. Nobody tried to wire it up completely since 2k9. So I unwind
this code and get rid of everything not required. Oh look! 19 lines were
removed while code still does the same thing.
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Tested-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2830d09

net-sysfs: use to_net_dev in net_namespace() · 5c29482d

由 Geliang Tang 提交于 12月 22, 2015

Use to_net_dev() instead of open-coding it.
Signed-off-by: NGeliang Tang <geliangtang@163.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c29482d

20 12月, 2015 2 次提交

6lowpan: fix debugfs interface entry name · 92e17ee7

由 Alexander Aring 提交于 12月 15, 2015

This patches moves the debugfs interface related register after
netdevice register. The function lowpan_dev_debugfs_init will use
"dev->name" which can be before register_netdevice a format string.
The function register_netdevice will evaluate the format string if
necessary and replace "dev->name" to the real interface name.
Reported-by: NLukasz Duda <lukasz.duda@nordicsemi.no>
Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
Acked-by: NLukasz Duda <lukasz.duda@nordicsemi.no>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

92e17ee7

Bluetooth: use list_for_each_entry* · 7eb7404f

由 Geliang Tang 提交于 12月 18, 2015

Use list_for_each_entry*() instead of list_for_each*() to simplify
the code.
Signed-off-by: NGeliang Tang <geliangtang@163.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

7eb7404f

19 12月, 2015 9 次提交

openvswitch: correct encoding of set tunnel action attributes · e905eabc

由 Simon Horman 提交于 12月 18, 2015

In a set action tunnel attributes should be encoded in a
nested action.

I noticed this because ovs-dpctl was reporting an error
when dumping flows due to the incorrect encoding of tunnel attributes
in a set action.

Fixes: fc4099f1 ("openvswitch: Fix egress tunnel info.")
Signed-off-by: NSimon Horman <simon.horman@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e905eabc

ipip: ioctl: Remove superfluous IP-TTL handling. · 6d3c348a

由 Pravin B Shelar 提交于 12月 17, 2015

IP-TTL case is already handled in ip_tunnel_ioctl() API.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d3c348a

tcp: diag: add support for request sockets to tcp_abort() · 07f6f4a3

由 Eric Dumazet 提交于 12月 17, 2015

Adding support for SYN_RECV request sockets to tcp_abort()
is quite easy after our tcp listener rewrite.

Note that we also need to better handle listeners, or we might
leak not yet accepted children, because of a missing
inet_csk_listen_stop() call.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Tested-by: NLorenzo Colitti <lorenzo@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07f6f4a3

bpf: fix misleading comment in bpf_convert_filter · 23bf8807

由 Daniel Borkmann 提交于 12月 17, 2015

Comment says "User BPF's register A is mapped to our BPF register 6",
which is actually wrong as the mapping is on register 0. This can
already be inferred from the code itself. So just remove it before
someone makes assumptions based on that. Only code tells truth. ;)
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

23bf8807

bpf: move clearing of A/X into classic to eBPF migration prologue · 8b614aeb

由 Daniel Borkmann 提交于 12月 17, 2015

Back in the days where eBPF (or back then "internal BPF" ;->) was not
exposed to user space, and only the classic BPF programs internally
translated into eBPF programs, we missed the fact that for classic BPF
A and X needed to be cleared. It was fixed back then via 83d5b7ef
("net: filter: initialize A and X registers"), and thus classic BPF
specifics were added to the eBPF interpreter core to work around it.

This added some confusion for JIT developers later on that take the
eBPF interpreter code as an example for deriving their JIT. F.e. in
f75298f5 ("s390/bpf: clear correct BPF accumulator register"), at
least X could leak stack memory. Furthermore, since this is only needed
for classic BPF translations and not for eBPF (verifier takes care
that read access to regs cannot be done uninitialized), more complexity
is added to JITs as they need to determine whether they deal with
migrations or native eBPF where they can just omit clearing A/X in
their prologue and thus reduce image size a bit, see f.e. cde66c2d
("s390/bpf: Only clear A and X for converted BPF programs"). In other
cases (x86, arm64), A and X is being cleared in the prologue also for
eBPF case, which is unnecessary.

Lets move this into the BPF migration in bpf_convert_filter() where it
actually belongs as long as the number of eBPF JITs are still few. It
can thus be done generically; allowing us to remove the quirk from
__bpf_prog_run() and to slightly reduce JIT image size in case of eBPF,
while reducing code duplication on this matter in current(/future) eBPF
JITs.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
Tested-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Zi Shen Lim <zlim.lnx@gmail.com>
Cc: Yang Shi <yang.shi@linaro.org>
Acked-by: NYang Shi <yang.shi@linaro.org>
Acked-by: NZi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b614aeb

bpf: add bpf_skb_load_bytes helper · 05c74e5e

由 Daniel Borkmann 提交于 12月 17, 2015

When hacking tc programs with eBPF, one of the issues that come up
from time to time is to load addresses from headers. In eBPF as in
classic BPF, we have BPF_LD | BPF_ABS | BPF_{B,H,W} instructions that
extract a byte, half-word or word out of the skb data though helpers
such as bpf_load_pointer() (interpreter case).

F.e. extracting a whole IPv6 address could possibly look like ...

  union v6addr {
    struct {
      __u32 p1;
      __u32 p2;
      __u32 p3;
      __u32 p4;
    };
    __u8 addr[16];
  };

  [...]

  a.p1 = htonl(load_word(skb, off));
  a.p2 = htonl(load_word(skb, off +  4));
  a.p3 = htonl(load_word(skb, off +  8));
  a.p4 = htonl(load_word(skb, off + 12));

  [...]

  /* access to a.addr[...] */

This work adds a complementary helper bpf_skb_load_bytes() (we also
have bpf_skb_store_bytes()) as an alternative where the same call
would look like from an eBPF program:

  ret = bpf_skb_load_bytes(skb, off, addr, sizeof(addr));

Same verifier restrictions apply as in ffeedafb ("bpf: introduce
current->pid, tgid, uid, gid, comm accessors") case, where stack memory
access needs to be statically verified and thus guaranteed to be
initialized in first use (otherwise verifier cannot tell whether a
subsequent access to it is valid or not as it's runtime dependent).
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

05c74e5e

net: Allow accepted sockets to be bound to l3mdev domain · 6dd9a14e

由 David Ahern 提交于 12月 16, 2015

Allow accepted sockets to derive their sk_bound_dev_if setting from the
l3mdev domain in which the packets originated. A sysctl setting is added
to control the behavior which is similar to sk_mark and
sysctl_tcp_fwmark_accept.

This effectively allow a process to have a "VRF-global" listen socket,
with child sockets bound to the VRF device in which the packet originated.
A similar behavior can be achieved using sk_mark, but a solution using marks
is incomplete as it does not handle duplicate addresses in different L3
domains/VRFs. Allowing sockets to inherit the sk_bound_dev_if from l3mdev
domain provides a complete solution.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6dd9a14e

ipv6: addrconf: use stable address generator for ARPHRD_NONE · cc9da6cc

由 Bjørn Mork 提交于 12月 16, 2015

Add a new address generator mode, using the stable address generator
with an automatically generated secret. This is intended as a default
address generator mode for device types with no EUI64 implementation.
The new generator is used for ARPHRD_NONE interfaces initially, adding
default IPv6 autoconf support to e.g. tun interfaces.

If the addrgenmode is set to 'random', either by default or manually,
and no stable secret is available, then a random secret is used as
input for the stable-privacy address generator. The secret can be
read and modified like manually configured secrets, using the proc
interface. Modifying the secret will change the addrgen mode to
'stable-privacy' to indicate that it operates on a known secret.

Existing behaviour of the 'stable-privacy' mode is kept unchanged. If
a known secret is available when the device is created, then the mode
will default to 'stable-privacy' as before. The mode can be manually
set to 'random' but it will behave exactly like 'stable-privacy' in
this case. The secret will not change.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: 吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NBjørn Mork <bjorn@mork.no>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc9da6cc

ila: add NETFILTER dependency · 8cb964da

由 Arnd Bergmann 提交于 12月 18, 2015

The recently added generic ILA translation facility fails to
build when CONFIG_NETFILTER is disabled:

net/ipv6/ila/ila_xlat.c:229:20: warning: 'struct nf_hook_state' declared inside parameter list
net/ipv6/ila/ila_xlat.c:235:27: error: array type has incomplete element type 'struct nf_hook_ops'
static struct nf_hook_ops ila_nf_hook_ops[] __read_mostly = {

This adds an explicit Kconfig dependency to avoid that case.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Fixes: 7f00feaf ("ila: Add generic ILA translation facility")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8cb964da

18 12月, 2015 1 次提交

netfilter: nft_ct: include direction when dumping NFT_CT_L3PROTOCOL key · d5f79b6e

由 Florian Westphal 提交于 12月 18, 2015

one nft userspace test case fails with

'ct l3proto original ipv4' mismatches 'ct l3proto ipv4'

... because NFTA_CT_DIRECTION attr is missing.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

d5f79b6e