提交 · df560056d960a3e164c179d89770d5a51b798537 · openanolis / cloud-kernel

07 1月, 2017 1 次提交

udp: inuse checks can quit early for reuseport · df560056

由 Eric Garver 提交于 1月 05, 2017

UDP lib inuse checks will walk the entire hash bucket to check if the
portaddr is in use. In the case of reuseport we can stop searching when
we find a matching reuseport.

On a 16-core VM a test program that spawns 16 threads that each bind to
1024 sockets (one per 10ms) takes 1m45s. With this change it takes 11s.

Also add a cond_resched() when the port is not specified.
Signed-off-by: NEric Garver <e@erig.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df560056

25 12月, 2016 1 次提交

Replace <asm/uaccess.h> with <linux/uaccess.h> globally · 7c0f6ba6

由 Linus Torvalds 提交于 12月 24, 2016

This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.
Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7c0f6ba6

10 12月, 2016 4 次提交

udp: udp_rmem_release() should touch sk_rmem_alloc later · 02ab0d13

由 Eric Dumazet 提交于 12月 08, 2016

In flood situations, keeping sk_rmem_alloc at a high value
prevents producers from touching the socket.

It makes sense to lower sk_rmem_alloc only at the end
of udp_rmem_release() after the thread draining receive
queue in udp_recvmsg() finished the writes to sk_forward_alloc.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02ab0d13

udp: add batching to udp_rmem_release() · 6b229cf7

由 Eric Dumazet 提交于 12月 08, 2016

If udp_recvmsg() constantly releases sk_rmem_alloc
for every read packet, it gives opportunity for
producers to immediately grab spinlocks and desperatly
try adding another packet, causing false sharing.

We can add a simple heuristic to give the signal
by batches of ~25 % of the queue capacity.

This patch considerably increases performance under
flood by about 50 %, since the thread draining the queue
is no longer slowed by false sharing.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b229cf7

udp: copy skb->truesize in the first cache line · c84d9490

由 Eric Dumazet 提交于 12月 08, 2016

In UDP RX handler, we currently clear skb->dev before skb
is added to receive queue, because device pointer is no longer
available once we exit from RCU section.

Since this first cache line is always hot, lets reuse this space
to store skb->truesize and thus avoid a cache line miss at
udp_recvmsg()/udp_skb_destructor time while receive queue
spinlock is held.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c84d9490

udp: add busylocks in RX path · 4b272750

由 Eric Dumazet 提交于 12月 08, 2016

Idea of busylocks is to let producers grab an extra spinlock
to relieve pressure on the receive_queue spinlock shared by consumer.

This behavior is requested only once socket receive queue is above
half occupancy.

Under flood, this means that only one producer can be in line
trying to acquire the receive_queue spinlock.

These busylock can be allocated on a per cpu manner, instead of a
per socket one (that would consume a cache line per socket)

This patch considerably improves UDP behavior under stress,
depending on number of NIC RX queues and/or RPS spread.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b272750

09 12月, 2016 1 次提交

udp: under rx pressure, try to condense skbs · c8c8b127

由 Eric Dumazet 提交于 12月 07, 2016

Under UDP flood, many softirq producers try to add packets to
UDP receive queue, and one user thread is burning one cpu trying
to dequeue packets as fast as possible.

Two parts of the per packet cost are :
- copying payload from kernel space to user space,
- freeing memory pieces associated with skb.

If socket is under pressure, softirq handler(s) can try to pull in
skb->head the payload of the packet if it fits.

Meaning the softirq handler(s) can free/reuse the page fragment
immediately, instead of letting udp_recvmsg() do this hundreds of usec
later, possibly from another node.

Additional gains :
- We reduce skb->truesize and thus can store more packets per SO_RCVBUF
- We avoid cache line misses at copyout() time and consume_skb() time,
and avoid one put_page() with potential alien freeing on NUMA hosts.

This comes at the cost of a copy, bounded to available tail room, which
is usually small. (We might have to fix GRO_MAX_HEAD which looks bigger
than necessary)

This patch gave me about 5 % increase in throughput in my tests.

skb_condense() helper could probably used in other contexts.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8c8b127

04 12月, 2016 1 次提交

udp: be less conservative with sock rmem accounting · 363dc73a

由 Paolo Abeni 提交于 12月 02, 2016

Before commit 850cbadd ("udp: use it's own memory accounting
schema"), the udp protocol allowed sk_rmem_alloc to grow beyond
the rcvbuf by the whole current packet's truesize. After said commit
we allow sk_rmem_alloc to exceed the rcvbuf only if the receive queue
is empty. As reported by Jesper this cause a performance regression
for some (small) values of rcvbuf.

This commit is intended to fix the regression restoring the old
handling of the rcvbuf limit.
Reported-by: NJesper Dangaard Brouer <brouer@redhat.com>
Fixes: 850cbadd ("udp: use it's own memory accounting schema")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

363dc73a

25 11月, 2016 1 次提交

udplite: call proper backlog handlers · 30c7be26

由 Eric Dumazet 提交于 11月 22, 2016

In commits 93821778 ("udp: Fix rcv socket locking") and
f7ad74fe ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into
__udpv6_queue_rcv_skb") UDP backlog handlers were renamed, but UDPlite
was forgotten.

This leads to crashes if UDPlite header is pulled twice, which happens
starting from commit e6afc8ac ("udp: remove headers from UDP packets
before queueing")

Bug found by syzkaller team, thanks a lot guys !

Note that backlog use in UDP/UDPlite is scheduled to be removed starting
from linux-4.10, so this patch is only needed up to linux-4.9

Fixes: 93821778 ("udp: Fix rcv socket locking")
Fixes: f7ad74fe ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb")
Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NAndrey Konovalov <andreyknvl@google.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30c7be26

22 11月, 2016 1 次提交

udp: avoid one cache line miss in recvmsg() · d21dbdfe

由 Eric Dumazet 提交于 11月 18, 2016

UDP_SKB_CB(skb)->partial_cov is located at offset 66 in skb,
requesting a cold cache line being read in cpu cache.

We can avoid this cache line miss for UDP sockets,
as partial_cov has a meaning only for UDPLite.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d21dbdfe

18 11月, 2016 1 次提交

udp: enable busy polling for all sockets · e68b6e50

由 Eric Dumazet 提交于 11月 16, 2016

UDP busy polling is restricted to connected UDP sockets.

This is because sk_busy_loop() only takes care of one NAPI context.

There are cases where it could be extended.

1) Some hosts receive traffic on a single NIC, with one RX queue.

2) Some applications use SO_REUSEPORT and associated BPF filter
   to split the incoming traffic on one UDP socket per RX
queue/thread/cpu

3) Some UDP sockets are used to send/receive traffic for one flow, but
they do not bother with connect()

This patch records the napi_id of first received skb, giving more
reach to busy polling.

Tested:

lpaa23:~# echo 70 >/proc/sys/net/core/busy_read
lpaa24:~# echo 70 >/proc/sys/net/core/busy_read

lpaa23:~# for f in `seq 1 10`; do ./super_netperf 1 -H lpaa24 -t UDP_RR -l 5; done

Before patch :
   27867   28870   37324   41060   41215
   36764   36838   44455   41282   43843
After patch :
   73920   73213   70147   74845   71697
   68315   68028   75219   70082   73707
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e68b6e50

16 11月, 2016 2 次提交

udp: restore UDPlite many-cast delivery · 73e2d5e3

由 Pablo Neira 提交于 11月 14, 2016

Honor udptable parameter that is passed to __udp*_lib_mcast_deliver(),
otherwise udplite broadcast/multicast use the wrong table and it breaks.

Fixes: 2dc41cff ("udp: Use hash2 for long hash1 chains in __udp*_lib_mcast_deliver.")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73e2d5e3

udplite: fix NULL pointer dereference · c915fe13

由 Paolo Abeni 提交于 11月 15, 2016

The commit 850cbadd ("udp: use it's own memory accounting schema")
assumes that the socket proto has memory accounting enabled,
but this is not the case for UDPLITE.
Fix it enabling memory accounting for UDPLITE and performing
fwd allocated memory reclaiming on socket shutdown.
UDP and UDPLITE share now the same memory accounting limits.
Also drop the backlog receive operation, since is no more needed.

Fixes: 850cbadd ("udp: use it's own memory accounting schema")
Reported-by: NAndrei Vagin <avagin@gmail.com>
Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c915fe13

10 11月, 2016 1 次提交

udp: provide udp{4,6}_lib_lookup for nf_socket_ipv{4,6} · 30f58158

由 Arnd Bergmann 提交于 11月 08, 2016

Since commit ca065d0c ("udp: no longer use SLAB_DESTROY_BY_RCU")
the udp6_lib_lookup and udp4_lib_lookup functions are only
provided when it is actually possible to call them.

However, moving the callers now caused a link error:

net/built-in.o: In function `nf_sk_lookup_slow_v6':
(.text+0x131a39): undefined reference to `udp6_lib_lookup'
net/ipv4/netfilter/nf_socket_ipv4.o: In function `nf_sk_lookup_slow_v4':
nf_socket_ipv4.c:(.text.nf_sk_lookup_slow_v4+0x114): undefined reference to `udp4_lib_lookup'

This extends the #ifdef so we also provide the functions when
CONFIG_NF_SOCKET_IPV4 or CONFIG_NF_SOCKET_IPV6, respectively
are set.

Fixes: 8db4c5be ("netfilter: move socket lookup infrastructure to nf_socket_ipv{4,6}.c")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

30f58158

08 11月, 2016 2 次提交

udp: do fwd memory scheduling on dequeue · 7c13f97f

由 Paolo Abeni 提交于 11月 04, 2016

A new argument is added to __skb_recv_datagram to provide
an explicit skb destructor, invoked under the receive queue
lock.
The UDP protocol uses such argument to perform memory
reclaiming on dequeue, so that the UDP protocol does not
set anymore skb->desctructor.
Instead explicit memory reclaiming is performed at close() time and
when skbs are removed from the receive queue.
The in kernel UDP protocol users now need to call a
skb_recv_udp() variant instead of skb_recv_datagram() to
properly perform memory accounting on dequeue.

Overall, this allows acquiring only once the receive queue
lock on dequeue.

Tested using pktgen with random src port, 64 bytes packet,
wire-speed on a 10G link as sender and udp_sink as the receiver,
using an l4 tuple rxhash to stress the contention, and one or more
udp_sink instances with reuseport.

nr sinks	vanilla		patched
1		440		560
3		2150		2300
6		3650		3800
9		4450		4600
12		6250		6450

v1 -> v2:
 - do rmem and allocated memory scheduling under the receive lock
 - do bulk scheduling in first_packet_length() and in udp_destruct_sock()
 - avoid the typdef for the dequeue callback
Suggested-by: NEric Dumazet <edumazet@google.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c13f97f

net/sock: add an explicit sk argument for ip_cmsg_recv_offset() · ad959036

由 Paolo Abeni 提交于 11月 04, 2016

So that we can use it even after orphaining the skbuff.
Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad959036

05 11月, 2016 1 次提交

net: inet: Support UID-based routing in IP protocols. · e2d118a1

由 Lorenzo Colitti 提交于 11月 04, 2016

- Use the UID in routing lookups made by protocol connect() and
  sendmsg() functions.
- Make sure that routing lookups triggered by incoming packets
  (e.g., Path MTU discovery) take the UID of the socket into
  account.
- For packets not associated with a userspace socket, (e.g., ping
  replies) use UID 0 inside the user namespace corresponding to
  the network namespace the socket belongs to. This allows
  all namespaces to apply routing and iptables rules to
  kernel-originated traffic in that namespaces by matching UID 0.
  This is better than using the UID of the kernel socket that is
  sending the traffic, because the UID of kernel sockets created
  at namespace creation time (e.g., the per-processor ICMP and
  TCP sockets) is the UID of the user that created the socket,
  which might not be mapped in the namespace.

Tested: compiles allnoconfig, allyesconfig, allmodconfig
Tested: https://android-review.googlesource.com/253302Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2d118a1

27 10月, 2016 1 次提交

udp: fix IP_CHECKSUM handling · 10df8e61

由 Eric Dumazet 提交于 10月 23, 2016

First bug was added in commit ad6f939a ("ip: Add offset parameter to
ip_cmsg_recv") : Tom missed that ipv4 udp messages could be received on
AF_INET6 socket. ip_cmsg_recv(msg, skb) should have been replaced by
ip_cmsg_recv_offset(msg, skb, sizeof(struct udphdr));

Then commit e6afc8ac ("udp: remove headers from UDP packets before
queueing") forgot to adjust the offsets now UDP headers are pulled
before skb are put in receive queue.

Fixes: ad6f939a ("ip: Add offset parameter to ip_cmsg_recv")
Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Sam Kumar <samanthakumar@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Tested-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10df8e61

23 10月, 2016 2 次提交

udp: use it's own memory accounting schema · 850cbadd

由 Paolo Abeni 提交于 10月 21, 2016

Completely avoid default sock memory accounting and replace it
with udp-specific accounting.

Since the new memory accounting model encapsulates completely
the required locking, remove the socket lock on both enqueue and
dequeue, and avoid using the backlog on enqueue.

Be sure to clean-up rx queue memory on socket destruction, using
udp its own sk_destruct.

Tested using pktgen with random src port, 64 bytes packet,
wire-speed on a 10G link as sender and udp_sink as the receiver,
using an l4 tuple rxhash to stress the contention, and one or more
udp_sink instances with reuseport.

nr readers      Kpps (vanilla)  Kpps (patched)
1               170             440
3               1250            2150
6               3000            3650
9               4200            4450
12              5700            6250

v4 -> v5:
  - avoid unneeded test in first_packet_length

v3 -> v4:
  - remove useless sk_rcvqueues_full() call

v2 -> v3:
  - do not set the now unsed backlog_rcv callback

v1 -> v2:
  - add memory pressure support
  - fixed dropwatch accounting for ipv6
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

850cbadd

udp: implement memory accounting helpers · f970bd9e

由 Paolo Abeni 提交于 10月 21, 2016

Avoid using the generic helpers.
Use the receive queue spin lock to protect the memory
accounting operation, both on enqueue and on dequeue.

On dequeue perform partial memory reclaiming, trying to
leave a quantum of forward allocated memory.

On enqueue use a custom helper, to allow some optimizations:
- use a plain spin_lock() variant instead of the slightly
  costly spin_lock_irqsave(),
- avoid dst_force check, since the calling code has already
  dropped the skb dst
- avoid orphaning the skb, since skb_steal_sock() already did
  the work for us

The above needs custom memory reclaiming on shutdown, provided
by the udp_destruct_sock().

v5 -> v6:
  - don't orphan the skb on enqueue

v4 -> v5:
  - replace the mem_lock with the receive queue spin lock
  - ensure that the bh is always allowed to enqueue at least
    a skb, even if sk_rcvbuf is exceeded

v3 -> v4:
  - reworked memory accunting, simplifying the schema
  - provide an helper for both memory scheduling and enqueuing

v1 -> v2:
  - use a udp specific destrctor to perform memory reclaiming
  - remove a couple of helpers, unneeded after the above cleanup
  - do not reclaim memory on dequeue if not under memory
    pressure
  - reworked the fwd accounting schema to avoid potential
    integer overflow
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f970bd9e

21 10月, 2016 1 次提交

udp: must lock the socket in udp_disconnect() · 286c72de

由 Eric Dumazet 提交于 10月 20, 2016

Baozeng Ding reported KASAN traces showing uses after free in
udp_lib_get_port() and other related UDP functions.

A CONFIG_DEBUG_PAGEALLOC=y kernel would eventually crash.

I could write a reproducer with two threads doing :

static int sock_fd;
static void *thr1(void *arg)
{
	for (;;) {
		connect(sock_fd, (const struct sockaddr *)arg,
			sizeof(struct sockaddr_in));
	}
}

static void *thr2(void *arg)
{
	struct sockaddr_in unspec;

	for (;;) {
		memset(&unspec, 0, sizeof(unspec));
	        connect(sock_fd, (const struct sockaddr *)&unspec,
			sizeof(unspec));
        }
}

Problem is that udp_disconnect() could run without holding socket lock,
and this was causing list corruptions.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NBaozeng Ding <sploving1@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

286c72de

11 9月, 2016 1 次提交

net: ipv4: Remove l3mdev_get_saddr · d66f6c0a

由 David Ahern 提交于 9月 10, 2016

No longer needed
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d66f6c0a

24 8月, 2016 4 次提交

udp: get rid of sk_prot_clear_portaddr_nulls() · 4cac8204

由 Eric Dumazet 提交于 8月 23, 2016

Since we no longer use SLAB_DESTROY_BY_RCU for UDP,
we do not need sk_prot_clear_portaddr_nulls() helper.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4cac8204

net: diag: support SOCK_DESTROY for UDP sockets · 5d77dca8

由 David Ahern 提交于 8月 23, 2016

This implements SOCK_DESTROY for UDP sockets similar to what was done
for TCP with commit c1e64e29 ("net: diag: Support destroying TCP
sockets.") A process with a UDP socket targeted for destroy is awakened
and recvmsg fails with ECONNABORTED.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d77dca8

udp: get rid of SLAB_DESTROY_BY_RCU allocations · 75d855a5

由 Eric Dumazet 提交于 8月 23, 2016

After commit ca065d0c ("udp: no longer use SLAB_DESTROY_BY_RCU")
we do not need this special allocation mode anymore, even if it is
harmless.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

75d855a5

udp: fix poll() issue with zero sized packets · e83c6744

由 Eric Dumazet 提交于 8月 23, 2016

Laura tracked poll() [and friends] regression caused by commit
e6afc8ac ("udp: remove headers from UDP packets before queueing")

udp_poll() needs to know if there is a valid packet in receive queue,
even if its payload length is 0.

Change first_packet_length() to return an signed int, and use -1
as the indication of an empty queue.

Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
Reported-by: NLaura Abbott <labbott@redhat.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Tested-by: NLaura Abbott <labbott@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e83c6744

20 8月, 2016 1 次提交

udp: include addrconf.h · 217375a0

由 Eric Dumazet 提交于 8月 18, 2016

Include ipv4_rcv_saddr_equal() definition to avoid this sparse error :

net/ipv4/udp.c:362:5: warning: symbol 'ipv4_rcv_saddr_equal' was not
declared. Should it be static?
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

217375a0

26 7月, 2016 1 次提交

udp: use sk_filter_trim_cap for udp{,6}_queue_rcv_skb · ba66bbe5

由 Daniel Borkmann 提交于 7月 25, 2016

After a6127697 ("udp: prevent bugcheck if filter truncates packet
too much"), there followed various other fixes for similar cases such
as f4979fce ("rose: limit sk_filter trim to payload").

Latter introduced a new helper sk_filter_trim_cap(), where we can pass
the trim limit directly to the socket filter handling. Make use of it
here as well with sizeof(struct udphdr) as lower cap limit and drop the
extra skb->len test in UDP's input path.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba66bbe5

12 7月, 2016 1 次提交

udp: prevent bugcheck if filter truncates packet too much · a6127697

由 Michal Kubeček 提交于 7月 08, 2016

If socket filter truncates an udp packet below the length of UDP header
in udpv6_queue_rcv_skb() or udp_queue_rcv_skb(), it will trigger a
BUG_ON in skb_pull_rcsum(). This BUG_ON (and therefore a system crash if
kernel is configured that way) can be easily enforced by an unprivileged
user which was reported as CVE-2016-6162. For a reproducer, see
http://seclists.org/oss-sec/2016/q3/8

Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
Reported-by: NMarco Grassi <marco.gra@gmail.com>
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6127697

15 6月, 2016 2 次提交

udp reuseport: fix packet of same flow hashed to different socket · d1e37288

由 Su, Xuemin 提交于 6月 13, 2016

There is a corner case in which udp packets belonging to a same
flow are hashed to different socket when hslot->count changes from 10
to 11:

1) When hslot->count <= 10, __udp_lib_lookup() searches udp_table->hash,
and always passes 'daddr' to udp_ehashfn().

2) When hslot->count > 10, __udp_lib_lookup() searches udp_table->hash2,
but may pass 'INADDR_ANY' to udp_ehashfn() if the sockets are bound to
INADDR_ANY instead of some specific addr.

That means when hslot->count changes from 10 to 11, the hash calculated by
udp_ehashfn() is also changed, and the udp packets belonging to a same
flow will be hashed to different socket.

This is easily reproduced:
1) Create 10 udp sockets and bind all of them to 0.0.0.0:40000.
2) From the same host send udp packets to 127.0.0.1:40000, record the
socket index which receives the packets.
3) Create 1 more udp socket and bind it to 0.0.0.0:44096. The number 44096
is 40000 + UDP_HASH_SIZE(4096), this makes the new socket put into the
same hslot as the aformentioned 10 sockets, and makes the hslot->count
change from 10 to 11.
4) From the same host send udp packets to 127.0.0.1:40000, and the socket
index which receives the packets will be different from the one received
in step 2.
This should not happen as the socket bound to 0.0.0.0:44096 should not
change the behavior of the sockets bound to 0.0.0.0:40000.

It's the same case for IPv6, and this patch also fixes that.
Signed-off-by: NSu, Xuemin <suxm@chinanetcenter.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1e37288

ipv4: fix checksum annotation in udp4_csum_init · b46d9f62

由 Hannes Frederic Sowa 提交于 6月 12, 2016

Reported-by: NCong Wang <xiyou.wangcong@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Tom Herbert <tom@herbertland.com>
Fixes: 4068579e ("net: Implmement RFC 6936 (zero RX csums for UDP/IPv6")
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b46d9f62

03 6月, 2016 1 次提交

Possible problem with ("udp: remove headers from UDP packets before queueing") · ce25d66a

由 Eric Dumazet 提交于 6月 02, 2016

Paul Moore tracked a regression caused by a recent commit, which
mistakenly assumed that sk_filter() could be avoided if socket
had no current BPF filter.

The intent was to avoid udp_lib_checksum_complete() overhead.

But sk_filter() also checks skb_pfmemalloc() and
security_sock_rcv_skb(), so better call it.

Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NPaul Moore <paul@paul-moore.com>
Tested-by: NPaul Moore <paul@paul-moore.com>
Tested-by: NStephen Smalley <sds@tycho.nsa.gov>
Cc: samanthakumar <samanthakumar@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce25d66a

21 5月, 2016 1 次提交

udp: prevent skbs lingering in tunnel socket queues · e5aed006

由 Hannes Frederic Sowa 提交于 5月 19, 2016

In case we find a socket with encapsulation enabled we should call
the encap_recv function even if just a udp header without payload is
available. The callbacks are responsible for correctly verifying and
dropping the packets.

Also, in case the header validation fails for geneve and vxlan we
shouldn't put the skb back into the socket queue, no one will pick
them up there.  Instead we can simply discard them in the respective
encap_recv functions.
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5aed006

13 5月, 2016 1 次提交

udp: Resolve NULL pointer dereference over flow-based vxlan device · ed7cbbce

由 Alexander Duyck 提交于 5月 12, 2016

While testing an OpenStack configuration using VXLANs I saw the following
call trace:

 RIP: 0010:[<ffffffff815fad49>] udp4_lib_lookup_skb+0x49/0x80
 RSP: 0018:ffff88103867bc50  EFLAGS: 00010286
 RAX: ffff88103269bf00 RBX: ffff88103269bf00 RCX: 00000000ffffffff
 RDX: 0000000000004300 RSI: 0000000000000000 RDI: ffff880f2932e780
 RBP: ffff88103867bc60 R08: 0000000000000000 R09: 000000009001a8c0
 R10: 0000000000004400 R11: ffffffff81333a58 R12: ffff880f2932e794
 R13: 0000000000000014 R14: 0000000000000014 R15: ffffe8efbfd89ca0
 FS:  0000000000000000(0000) GS:ffff88103fd80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000488 CR3: 0000000001c06000 CR4: 00000000001426e0
 Stack:
  ffffffff81576515 ffffffff815733c0 ffff88103867bc98 ffffffff815fcc17
  ffff88103269bf00 ffffe8efbfd89ca0 0000000000000014 0000000000000080
  ffffe8efbfd89ca0 ffff88103867bcc8 ffffffff815fcf8b ffff880f2932e794
 Call Trace:
  [<ffffffff81576515>] ? skb_checksum+0x35/0x50
  [<ffffffff815733c0>] ? skb_push+0x40/0x40
  [<ffffffff815fcc17>] udp_gro_receive+0x57/0x130
  [<ffffffff815fcf8b>] udp4_gro_receive+0x10b/0x2c0
  [<ffffffff81605863>] inet_gro_receive+0x1d3/0x270
  [<ffffffff81589e59>] dev_gro_receive+0x269/0x3b0
  [<ffffffff8158a1b8>] napi_gro_receive+0x38/0x120
  [<ffffffffa0871297>] gro_cell_poll+0x57/0x80 [vxlan]
  [<ffffffff815899d0>] net_rx_action+0x160/0x380
  [<ffffffff816965c7>] __do_softirq+0xd7/0x2c5
  [<ffffffff8107d969>] run_ksoftirqd+0x29/0x50
  [<ffffffff8109a50f>] smpboot_thread_fn+0x10f/0x160
  [<ffffffff8109a400>] ? sort_range+0x30/0x30
  [<ffffffff81096da8>] kthread+0xd8/0xf0
  [<ffffffff81693c82>] ret_from_fork+0x22/0x40
  [<ffffffff81096cd0>] ? kthread_park+0x60/0x60

The following trace is seen when receiving a DHCP request over a flow-based
VXLAN tunnel.  I believe this is caused by the metadata dst having a NULL
dev value and as a result dev_net(dev) is causing a NULL pointer dereference.

To resolve this I am replacing the check for skb_dst(skb)->dev with just
skb->dev.  This makes sense as the callers of this function are usually in
the receive path and as such skb->dev should always be populated.  In
addition other functions in the area where these are called are already
using dev_net(skb->dev) to determine the namespace the UDP packet belongs
in.

Fixes: 63058308 ("udp: Add udp6_lib_lookup_skb and udp4_lib_lookup_skb")
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ed7cbbce

03 5月, 2016 1 次提交

udp: prepare for non BH masking at backlog processing · e61da9e2

由 Eric Dumazet 提交于 4月 29, 2016

UDP uses the generic socket backlog code, and this will soon
be changed to not disable BH when protocol is called back.

We need to use appropriate SNMP accessors.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e61da9e2

28 4月, 2016 3 次提交

net: udp: rename UDP_INC_STATS_BH() · 02c22347

由 Eric Dumazet 提交于 4月 27, 2016

Rename UDP_INC_STATS_BH() to __UDP_INC_STATS(),
and UDP6_INC_STATS_BH() to __UDP6_INC_STATS()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02c22347

net: rename ICMP_INC_STATS_BH() · 5d3848bc

由 Eric Dumazet 提交于 4月 27, 2016

Rename ICMP_INC_STATS_BH() to __ICMP_INC_STATS()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d3848bc

net: snmp: kill various STATS_USER() helpers · 6aef70a8

由 Eric Dumazet 提交于 4月 27, 2016

In the old days (before linux-3.0), SNMP counters were duplicated,
one for user context, and one for BH context.

After commit 8f0ea0fe ("snmp: reduce percpu needs by 50%")
we have a single copy, and what really matters is preemption being
enabled or disabled, since we use this_cpu_inc() or __this_cpu_inc()
respectively.

We therefore kill SNMP_INC_STATS_USER(), SNMP_ADD_STATS_USER(),
NET_INC_STATS_USER(), NET_ADD_STATS_USER(), SCTP_INC_STATS_USER(),
SNMP_INC_STATS64_USER(), SNMP_ADD_STATS64_USER(), TCP_ADD_STATS_USER(),
UDP_INC_STATS_USER(), UDP6_INC_STATS_USER(), and XFRM_INC_STATS_USER()

Following patches will rename __BH helpers to make clear their
usage is not tied to BH being disabled.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6aef70a8

19 4月, 2016 1 次提交

udp: fix if statement in SIOCINQ ioctl · 110361f4

由 Dan Carpenter 提交于 4月 18, 2016

We deleted a line of code and accidentally made the "return put_user()"
part of the if statement when it's supposed to be unconditional.

Fixes: 9f9a45be ('udp: do not expect udp headers on ioctl SIOCINQ')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

110361f4

15 4月, 2016 1 次提交

soreuseport: fix ordering for mixed v4/v6 sockets · d894ba18

由 Craig Gallek 提交于 4月 12, 2016

With the SO_REUSEPORT socket option, it is possible to create sockets
in the AF_INET and AF_INET6 domains which are bound to the same IPv4 address.
This is only possible with SO_REUSEPORT and when not using IPV6_V6ONLY on
the AF_INET6 sockets.

Prior to the commits referenced below, an incoming IPv4 packet would
always be routed to a socket of type AF_INET when this mixed-mode was used.
After those changes, the same packet would be routed to the most recently
bound socket (if this happened to be an AF_INET6 socket, it would
have an IPv4 mapped IPv6 address).

The change in behavior occurred because the recent SO_REUSEPORT optimizations
short-circuit the socket scoring logic as soon as they find a match. They
did not take into account the scoring logic that favors AF_INET sockets
over AF_INET6 sockets in the event of a tie.

To fix this problem, this patch changes the insertion order of AF_INET
and AF_INET6 addresses in the TCP and UDP socket lists when the sockets
have SO_REUSEPORT set. AF_INET sockets will be inserted at the head of the
list and AF_INET6 sockets with SO_REUSEPORT set will always be inserted at
the tail of the list. This will force AF_INET sockets to always be
considered first.

Fixes: e32ea7e7 ("soreuseport: fast reuseport UDP socket selection")
Fixes: 125e80b88687 ("soreuseport: fast reuseport TCP socket selection")
Reported-by: NMaciej Żenczykowski <maze@google.com>
Signed-off-by: NCraig Gallek <kraig@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d894ba18

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功