提交 · ea3793ee29d3621faf857fa8ef5425e9ff9a756d · openeuler / Kernel

07 12月, 2015 1 次提交

core: enable more fine-grained datagram reception control · ea3793ee

由 Rainer Weikusat 提交于 12月 06, 2015

The __skb_recv_datagram routine in core/ datagram.c provides a general
skb reception factility supposed to be utilized by protocol modules
providing datagram sockets. It encompasses both the actual recvmsg code
and a surrounding 'sleep until data is available' loop. This is
inconvenient if a protocol module has to use additional locking in order
to maintain some per-socket state the generic datagram socket code is
unaware of (as the af_unix code does). The patch below moves the recvmsg
proper code into a new __skb_try_recv_datagram routine which doesn't
sleep and renames wait_for_more_packets to
__skb_wait_for_more_packets, both routines being exported interfaces. The
original __skb_recv_datagram routine is reimplemented on top of these
two functions such that its user-visible behaviour remains unchanged.
Signed-off-by: NRainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea3793ee

06 12月, 2015 3 次提交

net: constify netif_is_* helpers net_device param · b618aaa9

由 Jiri Pirko 提交于 12月 04, 2015

As suggested by Eric, these helpers should have const dev param.
Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b618aaa9

batman-adv: Act on NETDEV_*_TYPE_CHANGE events · a1a66b11

由 Andrew Lunn 提交于 12月 03, 2015

A network interface can change type. It may change from a type which
batman does not support, e.g. hdlc, to one it does, e.g. hdlc-eth.
When an interface changes type, it sends two notifications. Handle
these notifications.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a1a66b11

ipv6: Only act upon NETDEV_*_TYPE_CHANGE if we have ipv6 addresses · 3ef0952c

由 Andrew Lunn 提交于 12月 03, 2015

An interface changing type may not have IPv6 addresses. Don't
call the address configuration type change in this case.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ef0952c

04 12月, 2015 18 次提交

tipc: fix node reference count bug · dc8d1eb3

由 Jon Paul Maloy 提交于 12月 02, 2015

Commit 5405ff6e ("tipc: convert node lock to rwlock")
introduced a bug to the node reference counter handling. When a
message is successfully sent in the function tipc_node_xmit(),
we return directly after releasing the node lock, instead of
continuing and decrementing the node reference counter as we
should do.

This commit fixes this bug.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc8d1eb3

pppox: use standard module auto-loading feature · 681b4d88

由 Guillaume Nault 提交于 12月 02, 2015

* Register PF_PPPOX with pppox module rather than with pppoe,
    so that pppoe doesn't get loaded for any PF_PPPOX socket.

  * Register PX_PROTO_* with standard MODULE_ALIAS_NET_PF_PROTO()
    instead of using pppox's own naming scheme.

  * While there, add auto-loading feature for pptp.
Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

681b4d88

VSOCK: Add Makefile and Kconfig · 8a2a2029

由 Asias He 提交于 12月 02, 2015

Enable virtio-vsock and vhost-vsock.
Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a2a2029

VSOCK: Introduce virtio-vsock.ko · 32e61b06

由 Asias He 提交于 12月 02, 2015

VM sockets virtio transport implementation. This module runs in guest
kernel.
Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

32e61b06

VSOCK: Introduce virtio-vsock-common.ko · 80a19e33

由 Asias He 提交于 12月 02, 2015

This module contains the common code and header files for the following
virtio-vsock and virtio-vhost kernel modules.
Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80a19e33

VSOCK: Introduce vsock_find_unbound_socket and vsock_bind_dgram_generic · 357ab223

由 Asias He 提交于 12月 02, 2015

Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

357ab223

mpls: support for dead routes · c89359a4

由 Roopa Prabhu 提交于 12月 01, 2015

Adds support for RTNH_F_DEAD and RTNH_F_LINKDOWN flags on mpls
routes due to link events. Also adds code to ignore dead
routes during route selection.

Unlike ip routes, mpls routes are not deleted when the route goes
dead. This is current mpls behaviour and this patch does not change
that. With this patch however, routes will be marked dead.
dead routes are not notified to userspace (this is consistent with ipv4
routes).

dead routes:
-----------
$ip -f mpls route show
100
    nexthop as to 200 via inet 10.1.1.2  dev swp1
    nexthop as to 700 via inet 10.1.1.6  dev swp2

$ip link set dev swp1 down

$ip link show dev swp1
4: swp1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN mode
DEFAULT group default qlen 1000
    link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff

$ip -f mpls route show
100
    nexthop as to 200 via inet 10.1.1.2  dev swp1 dead linkdown
    nexthop as to 700 via inet 10.1.1.6  dev swp2

linkdown routes:
----------------
$ip -f mpls route show
100
    nexthop as to 200 via inet 10.1.1.2  dev swp1
    nexthop as to 700 via inet 10.1.1.6  dev swp2

$ip link show dev swp1
4: swp1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 1000
    link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff

/* carrier goes down */
$ip link show dev swp1
4: swp1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast
state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff

$ip -f mpls route show
100
    nexthop as to 200 via inet 10.1.1.2  dev swp1 linkdown
    nexthop as to 700 via inet 10.1.1.6  dev swp2
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: NRobert Shearman <rshearma@brocade.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c89359a4

net_sched: fix qdisc_tree_decrease_qlen() races · 4eaf3b84

由 Eric Dumazet 提交于 12月 01, 2015

qdisc_tree_decrease_qlen() suffers from two problems on multiqueue
devices.

One problem is that it updates sch->q.qlen and sch->qstats.drops
on the mq/mqprio root qdisc, while it should not : Daniele
reported underflows errors :
[  681.774821] PAX: sch->q.qlen: 0 n: 1
[  681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head;
[  681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G           O    4.2.6.201511282239-1-grsec #1
[  681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015
[  681.774956]  ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c
[  681.774959]  ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b
[  681.774960]  ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001
[  681.774962] Call Trace:
[  681.774967]  [<ffffffffa95d2810>] dump_stack+0x4c/0x7f
[  681.774970]  [<ffffffffa91a44f4>] report_size_overflow+0x34/0x50
[  681.774972]  [<ffffffffa94d17e2>] qdisc_tree_decrease_qlen+0x152/0x160
[  681.774976]  [<ffffffffc02694b1>] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel]
[  681.774978]  [<ffffffffc02680a0>] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel]
[  681.774980]  [<ffffffffa94cd92d>] __qdisc_run+0x4d/0x1d0
[  681.774983]  [<ffffffffa949b2b2>] net_tx_action+0xc2/0x160
[  681.774985]  [<ffffffffa90664c1>] __do_softirq+0xf1/0x200
[  681.774987]  [<ffffffffa90665ee>] run_ksoftirqd+0x1e/0x30
[  681.774989]  [<ffffffffa90896b0>] smpboot_thread_fn+0x150/0x260
[  681.774991]  [<ffffffffa9089560>] ? sort_range+0x40/0x40
[  681.774992]  [<ffffffffa9085fe4>] kthread+0xe4/0x100
[  681.774994]  [<ffffffffa9085f00>] ? kthread_worker_fn+0x170/0x170
[  681.774995]  [<ffffffffa95d8d1e>] ret_from_fork+0x3e/0x70

mq/mqprio have their own ways to report qlen/drops by folding stats on
all their queues, with appropriate locking.

A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup()
without proper locking : concurrent qdisc updates could corrupt the list
that qdisc_match_from_root() parses to find a qdisc given its handle.

Fix first problem adding a TCQ_F_NOPARENT qdisc flag that
qdisc_tree_decrease_qlen() can use to abort its tree traversal,
as soon as it meets a mq/mqprio qdisc children.

Second problem can be fixed by RCU protection.
Qdisc are already freed after RCU grace period, so qdisc_list_add() and
qdisc_list_del() simply have to use appropriate rcu list variants.

A future patch will add a per struct netdev_queue list anchor, so that
qdisc_tree_decrease_qlen() can have more efficient lookups.
Reported-by: NDaniele Fucini <dfucini@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4eaf3b84

net: ipv6: restrict hop_limit sysctl setting to range [1; 255] · d6df198d

由 Phil Sutter 提交于 12月 01, 2015

Setting a value bigger than 255 resulted in using only the lower eight
bits of that value as it is assigned to the u8 header field. To avoid
this unexpected result, reject such values.

Setting a value of zero is technically possible, but hosts receiving
such a packet have to treat it like hop_limit was set to one, according
to RFC2460. Therefore I don't see a use-case for that.

Setting a route's hop_limit to zero in iproute2 means to use the sysctl
default, which is not the case here: Setting e.g.
net.conf.eth0.hop_limit=0 will not make the kernel use
net.conf.all.hop_limit for outgoing packets on eth0. To avoid these
kinds of confusion, reject zero.
Signed-off-by: NPhil Sutter <phil@nwl.cc>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6df198d

openvswitch: fix hangup on vxlan/gre/geneve device deletion · 13175303

由 Paolo Abeni 提交于 12月 01, 2015

Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference
to the underlying tunnel device, but never released it when such
device is deleted.
Deleting the underlying device via the ip tool cause the kernel to
hangup in the netdev_wait_allrefs() loop.
This commit ensure that on device unregistration dp_detach_port_notify()
is called for all vports that hold the device reference, properly
releasing it.

Fixes: 614732ea ("openvswitch: Use regular VXLAN net_device device")
Fixes: b2acd1dc ("openvswitch: Use regular GRE net_device instead of vport")
Fixes: 6b001e68 ("openvswitch: Use Geneve device.")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NFlavio Leitner <fbl@sysclose.org>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13175303

ipv4: igmp: Allow removing groups from a removed interface · 4eba7bb1

由 Andrew Lunn 提交于 12月 01, 2015

When a multicast group is joined on a socket, a struct ip_mc_socklist
is appended to the sockets mc_list containing information about the
joined group.

If the interface is hot unplugged, this entry becomes stale. Prior to
commit 52ad353a ("igmp: fix the problem when mc leave group") it
was possible to remove the stale entry by performing a
IP_DROP_MEMBERSHIP, passing either the old ifindex or ip address on
the interface. However, this fix enforces that the interface must
still exist. Thus with time, the number of stale entries grows, until
sysctl_igmp_max_memberships is reached and then it is not possible to
join and more groups.

The previous patch fixes an issue where a IP_DROP_MEMBERSHIP is
performed without specifying the interface, either by ifindex or ip
address. However here we do supply one of these. So loosen the
restriction on device existence to only apply when the interface has
not been specified. This then restores the ability to clean up the
stale entries.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Fixes: 52ad353a "(igmp: fix the problem when mc leave group")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4eba7bb1

ipv6: sctp: implement sctp_v6_destroy_sock() · 602dd62d

由 Eric Dumazet 提交于 12月 01, 2015

Dmitry Vyukov reported a memory leak using IPV6 SCTP sockets.

We need to call inet6_destroy_sock() to properly release
inet6 specific fields.
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

602dd62d

net: introduce change lower state notifier · 04d48266

由 Jiri Pirko 提交于 12月 03, 2015

When lower device like bonding slave, team/bridge port, etc changes its
state, it is useful for others to notice this change. Currently this is
implemented specificly for bonding as NETDEV_BONDING_INFO notifier. This
patch aims to replace this specific usage and make this more generic to
be used for all upper-lower devices.

Introduce NETDEV_CHANGELOWERSTATE netdev notifier type and
netdev_lower_state_changed() helper.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04d48266

net: add possibility to pass information about upper device via notifier · 29bf24af

由 Jiri Pirko 提交于 12月 03, 2015

Sometimes the drivers and other code would find it handy to know some
internal information about upper device being changed. So allow upper-code
to pass information down to notifier listeners during linking.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29bf24af

net: propagate upper priv via netdev_master_upper_dev_link · 6dffb044

由 Jiri Pirko 提交于 12月 03, 2015

Eliminate netdev_master_upper_dev_link_private and pass priv directly as
a parameter of netdev_master_upper_dev_link.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6dffb044

net: Check CHANGEUPPER notifier return value · b03804e7

由 Ido Schimmel 提交于 12月 03, 2015

switchdev drivers reflect the newly requested topology to hardware when
CHANGEUPPER is received, after software links were already formed.
However, the operation can fail and user will not be notified, as the
return value of the notifier is not checked.

Add this check and rollback software links if necessary.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b03804e7

ipv6: kill sk_dst_lock · 6bd4f355

由 Eric Dumazet 提交于 12月 02, 2015

While testing the np->opt RCU conversion, I found that UDP/IPv6 was
using a mixture of xchg() and sk_dst_lock to protect concurrent changes
to sk->sk_dst_cache, leading to possible corruptions and crashes.

ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest
way to fix the mess is to remove sk_dst_lock completely, as we did for
IPv4.

__ip6_dst_store() and ip6_dst_store() share same implementation.

sk_setup_caps() being called with socket lock being held or not,
we have to use sk_dst_set() instead of __sk_dst_set()

Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);"
in ip6_dst_store() before the sk_setup_caps(sk, dst) call.

This is because ip6_dst_store() can be called from process context,
without any lock held.

As soon as the dst is installed in sk->sk_dst_cache, dst can be freed
from another cpu doing a concurrent ip6_dst_store()

Doing the dst dereference before doing the install is needed to make
sure no use after free would trigger.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6bd4f355

ipv6: sctp: add rcu protection around np->opt · c836a8ba

由 Eric Dumazet 提交于 12月 02, 2015

This patch completes the work I did in commit 45f6fad8
("ipv6: add complete rcu protection around np->opt"), as I missed
sctp part.

This simply makes sure np->opt is used with proper RCU locking
and accessors.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c836a8ba

03 12月, 2015 8 次提交

net/neighbour: fix crash at dumping device-agnostic proxy entries · 6adc5fd6

由 Konstantin Khlebnikov 提交于 12月 01, 2015

Proxy entries could have null pointer to net-device.
Signed-off-by: NKonstantin Khlebnikov <koct9i@gmail.com>
Fixes: 84920c14 ("net: Allow ipv6 proxies and arp proxies be shown with iproute2")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6adc5fd6

tcp: suppress too verbose messages in tcp_send_ack() · 7450aaf6

由 Eric Dumazet 提交于 11月 30, 2015

If tcp_send_ack() can not allocate skb, we properly handle this
and setup a timer to try later.

Use __GFP_NOWARN to avoid polluting syslog in the case host is
under memory pressure, so that pertinent messages are not lost under
a flood of useless information.

sk_gfp_atomic() can use its gfp_mask argument (all callers currently
were using GFP_ATOMIC before this patch)

We rename sk_gfp_atomic() to sk_gfp_mask() to clearly express this
function now takes into account its second argument (gfp_mask)

Note that when tcp_transmit_skb() is called with clone_it set to false,
we do not attempt memory allocations, so can pass a 0 gfp_mask, which
most compilers can emit faster than a non zero or constant value.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7450aaf6

sctp: use GFP_USER for user-controlled kmalloc · cacc0621

由 Marcelo Ricardo Leitner 提交于 11月 30, 2015

Dmitry Vyukov reported that the user could trigger a kernel warning by
using a large len value for getsockopt SCTP_GET_LOCAL_ADDRS, as that
value directly affects the value used as a kmalloc() parameter.

This patch thus switches the allocation flags from all user-controllable
kmalloc size to GFP_USER to put some more restrictions on it and also
disables the warn, as they are not necessary.
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cacc0621

ipv6: add complete rcu protection around np->opt · 45f6fad8

由 Eric Dumazet 提交于 11月 29, 2015

This patch addresses multiple problems :

UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np->opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.

Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np->opt once (before calling ipv6_dup_options())

This patch adds full RCU protection to np->opt
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45f6fad8

mac80211: fix off-channel mgmt-tx uninitialized variable usage · c1df932c

由 Johannes Berg 提交于 11月 27, 2015

In the last change here, I neglected to update the cookie in one code
path: when a mgmt-tx has no real cookie sent to userspace as it doesn't
wait for a response, but is off-channel. The original code used the SKB
pointer as the cookie and always assigned the cookie to the TX SKB in
ieee80211_start_roc_work(), but my change turned this around and made
the code rely on a valid cookie being passed in.

Unfortunately, the off-channel no-wait TX path wasn't assigning one at
all, resulting in an uninitialized stack value being used. This wasn't
handed back to userspace as a cookie (since in the no-wait case there
isn't a cookie), but it was tested for non-zero to distinguish between
mgmt-tx and off-channel.

Fix this by assigning a dummy non-zero cookie unconditionally, and get
rid of a misleading comment and some dead code while at it. I'll clean
up the ACK SKB handling separately later.

Fixes: 3b79af97 ("mac80211: stop using pointers as userspace cookies")
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

c1df932c

mac80211: do not actively scan DFS channels · 4e39ccac

由 Antonio Quartulli 提交于 11月 21, 2015

DFS channels should not be actively scanned as we can't be sure
if we are allowed or not.

If the current channel is in the DFS band, active scan might be
performed after CSA, but we have no guarantee about other channels,
therefore it is safer to prevent active scanning at all.

Cc: stable@vger.kernel.org
Signed-off-by: NAntonio Quartulli <antonio@open-mesh.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

4e39ccac

mac80211: don't teardown sdata on sdata stop · 835112b2

由 Eliad Peller 提交于 11月 17, 2015

Interfaces are being initialized (setup) on addition,
and torn down on removal.

However, p2p device is being torn down when stopped,
resulting in the next p2p start operation being done
on uninitialized interface.

Solve it by calling ieee80211_teardown_sdata() only
on interface removal (for the non-netdev case).
Signed-off-by: NEliad Peller <eliadx.peller@intel.com>
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
[squashed in fix to call teardown after unregister]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

835112b2

openvswitch: properly refcount vport-vxlan module · 83e4bf7a

由 Paolo Abeni 提交于 11月 30, 2015

After 614732ea, no refcount is maintained for the vport-vxlan module.
This allows the userspace to remove such module while vport-vxlan
devices still exist, which leads to later oops.

v1 -> v2:
 - move vport 'owner' initialization in ovs_vport_ops_register()
   and make such function a macro

Fixes: 614732ea ("openvswitch: Use regular VXLAN net_device device")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83e4bf7a

02 12月, 2015 4 次提交

net: fix sock_wake_async() rcu protection · ceb5d58b

由 Eric Dumazet 提交于 11月 29, 2015

Dmitry provided a syzkaller (http://github.com/google/syzkaller)
triggering a fault in sock_wake_async() when async IO is requested.

Said program stressed af_unix sockets, but the issue is generic
and should be addressed in core networking stack.

The problem is that by the time sock_wake_async() is called,
we should not access the @flags field of 'struct socket',
as the inode containing this socket might be freed without
further notice, and without RCU grace period.

We already maintain an RCU protected structure, "struct socket_wq"
so moving SOCKWQ_ASYNC_NOSPACE & SOCKWQ_ASYNC_WAITDATA into it
is the safe route.

It also reduces number of cache lines needing dirtying, so might
provide a performance improvement anyway.

In followup patches, we might move remaining flags (SOCK_NOSPACE,
SOCK_PASSCRED, SOCK_PASSSEC) to save 8 bytes and let 'struct socket'
being mostly read and let it being shared between cpus.
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ceb5d58b

net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA · 9cd3e072

由 Eric Dumazet 提交于 11月 29, 2015

This patch is a cleanup to make following patch easier to
review.

Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
from (struct socket)->flags to a (struct socket_wq)->flags
to benefit from RCU protection in sock_wake_async()

To ease backports, we rename both constants.

Two new helpers, sk_set_bit(int nr, struct sock *sk)
and sk_clear_bit(int net, struct sock *sk) are added so that
following patch can change their implementation.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9cd3e072

Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests" · 304d888b

由 Nicolas Dichtel 提交于 11月 27, 2015

This reverts commit ab450605.

In IPv6, we cannot inherit the dst of the original dst. ndisc packets
are IPv6 packets and may take another route than the original packet.

This patch breaks the following scenario: a packet comes from eth0 and
is forwarded through vxlan1. The encapsulated packet triggers an NS
which cannot be sent because of the wrong route.

CC: Jiri Benc <jbenc@redhat.com>
CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

304d888b

unix: use wq_has_sleeper in unix_dgram_recvmsg · 77b75f4d

由 Rainer Weikusat 提交于 11月 26, 2015

The current unix_dgram_recvmsg does a wake up for every received
datagram. This seems wasteful as only SOCK_DGRAM client sockets in an
n:1 association with a server socket will ever wait because of the
associated condition. The patch below changes the function such that the
wake up only happens if wq_has_sleeper indicates that someone actually
wants to be notified. Testing with SOCK_SEQPACKET and SOCK_DGRAM socket
seems to confirm that this is an improvment.
Signed-Off-By: NRainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77b75f4d

01 12月, 2015 6 次提交

tcp: initialize tp->copied_seq in case of cross SYN connection · 142a2e7e

由 Eric Dumazet 提交于 11月 26, 2015

Dmitry provided a syzkaller (http://github.com/google/syzkaller)
generated program that triggers the WARNING at
net/ipv4/tcp.c:1729 in tcp_recvmsg() :

WARN_ON(tp->copied_seq != tp->rcv_nxt &&
        !(flags & (MSG_PEEK | MSG_TRUNC)));

His program is specifically attempting a Cross SYN TCP exchange,
that we support (for the pleasure of hackers ?), but it looks we
lack proper tcp->copied_seq initialization.

Thanks again Dmitry for your report and testings.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Tested-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

142a2e7e

net: ipmr: add mfc newroute/delroute netlink support · ccbb0aa6

由 Nikolay Aleksandrov 提交于 11月 26, 2015

This patch adds support to add and remove MFC entries. It uses the
same attributes like the already present dump support in order to be
consistent. There's one new entry - RTA_PREFSRC, it's used to denote an
MFC_PROXY entry (see MRT_ADD_MFC vs MRT_ADD_MFC_PROXY).
The already existing infrastructure is used to create and delete the
entries, the netlink message gets converted internally to a struct mfcctl
which is used with ipmr_mfc_add/delete.
The other used attributes are:
RTA_IIF - used for mfcc_parent (when adding it's required to be valid)
RTA_SRC - used for mfcc_origin
RTA_DST - used for mfcc_mcastgrp
RTA_TABLE - the MRT table id
RTA_MULTIPATH - the "oifs" ttl array
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ccbb0aa6

net: ipmr: fix setsockopt error return · 42e6b89c

由 Nikolay Aleksandrov 提交于 11月 26, 2015

We can have both errors and we'll return the second one, fix it to
return an error at a time as it's normal. I've overlooked this in my
previous set.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42e6b89c

net: ipmr: move pimsm_enabled to pim.h and rename · 1973a4ea

由 Nikolay Aleksandrov 提交于 11月 26, 2015

Move the inline pimsm_enabled() to pim.h and rename it to
ipmr_pimsm_enabled to show it's for the ipv4 ipmr code since pim.h is
used by IPv6 too.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1973a4ea

net: ipmr: move struct mr_table and VIF_EXISTS to mroute.h · 5ea1f132

由 Nikolay Aleksandrov 提交于 11月 26, 2015

Move the definitions of VIF_EXISTS() and struct mr_table to mroute.h
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ea1f132

net: ipmr: remove unused MFC_NOTIFY flag and make the flags enum · 06bd6c03

由 Nikolay Aleksandrov 提交于 11月 26, 2015

MFC_NOTIFY was introduced in kernel 2.1.68 but afaik it hasn't been used
and I couldn't find any users currently so just remove it. Only
MFC_STATIC is left, so move it into an enum, add a description and use
BIT().
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06bd6c03

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功