提交 · 02c223470c3cc30e5ff90217abea761679553ac3 · openeuler / raspberrypi-kernel

28 4月, 2016 3 次提交

net: udp: rename UDP_INC_STATS_BH() · 02c22347

由 Eric Dumazet 提交于 4月 27, 2016

Rename UDP_INC_STATS_BH() to __UDP_INC_STATS(),
and UDP6_INC_STATS_BH() to __UDP6_INC_STATS()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02c22347

net: rename ICMP_INC_STATS_BH() · 5d3848bc

由 Eric Dumazet 提交于 4月 27, 2016

Rename ICMP_INC_STATS_BH() to __ICMP_INC_STATS()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d3848bc

net: snmp: kill various STATS_USER() helpers · 6aef70a8

由 Eric Dumazet 提交于 4月 27, 2016

In the old days (before linux-3.0), SNMP counters were duplicated,
one for user context, and one for BH context.

After commit 8f0ea0fe ("snmp: reduce percpu needs by 50%")
we have a single copy, and what really matters is preemption being
enabled or disabled, since we use this_cpu_inc() or __this_cpu_inc()
respectively.

We therefore kill SNMP_INC_STATS_USER(), SNMP_ADD_STATS_USER(),
NET_INC_STATS_USER(), NET_ADD_STATS_USER(), SCTP_INC_STATS_USER(),
SNMP_INC_STATS64_USER(), SNMP_ADD_STATS64_USER(), TCP_ADD_STATS_USER(),
UDP_INC_STATS_USER(), UDP6_INC_STATS_USER(), and XFRM_INC_STATS_USER()

Following patches will rename __BH helpers to make clear their
usage is not tied to BH being disabled.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6aef70a8

19 4月, 2016 1 次提交

udp: fix if statement in SIOCINQ ioctl · 110361f4

由 Dan Carpenter 提交于 4月 18, 2016

We deleted a line of code and accidentally made the "return put_user()"
part of the if statement when it's supposed to be unconditional.

Fixes: 9f9a45be ('udp: do not expect udp headers on ioctl SIOCINQ')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

110361f4

15 4月, 2016 1 次提交

soreuseport: fix ordering for mixed v4/v6 sockets · d894ba18

由 Craig Gallek 提交于 4月 12, 2016

With the SO_REUSEPORT socket option, it is possible to create sockets
in the AF_INET and AF_INET6 domains which are bound to the same IPv4 address.
This is only possible with SO_REUSEPORT and when not using IPV6_V6ONLY on
the AF_INET6 sockets.

Prior to the commits referenced below, an incoming IPv4 packet would
always be routed to a socket of type AF_INET when this mixed-mode was used.
After those changes, the same packet would be routed to the most recently
bound socket (if this happened to be an AF_INET6 socket, it would
have an IPv4 mapped IPv6 address).

The change in behavior occurred because the recent SO_REUSEPORT optimizations
short-circuit the socket scoring logic as soon as they find a match. They
did not take into account the scoring logic that favors AF_INET sockets
over AF_INET6 sockets in the event of a tie.

To fix this problem, this patch changes the insertion order of AF_INET
and AF_INET6 addresses in the TCP and UDP socket lists when the sockets
have SO_REUSEPORT set. AF_INET sockets will be inserted at the head of the
list and AF_INET6 sockets with SO_REUSEPORT set will always be inserted at
the tail of the list. This will force AF_INET sockets to always be
considered first.

Fixes: e32ea7e7 ("soreuseport: fast reuseport UDP socket selection")
Fixes: 125e80b88687 ("soreuseport: fast reuseport TCP socket selection")
Reported-by: NMaciej Żenczykowski <maze@google.com>
Signed-off-by: NCraig Gallek <kraig@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d894ba18

14 4月, 2016 2 次提交

udp: do not expect udp headers in recv cmsg IP_CMSG_CHECKSUM · 31c2e492

由 Willem de Bruijn 提交于 4月 07, 2016

On udp sockets, recv cmsg IP_CMSG_CHECKSUM returns a checksum over
the packet payload. Since commit e6afc8ac pulled the headers,
taking skb->data as the start of transport header is incorrect. Use
the transport header pointer.

Also, when peeking at an offset from the start of the packet, only
return a checksum from the start of the peeked data. Note that the
cmsg does not subtract a tail checkum when reading truncated data.

Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31c2e492

udp: do not expect udp headers on ioctl SIOCINQ · 9f9a45be

由 Willem de Bruijn 提交于 4月 07, 2016

On udp sockets, ioctl SIOCINQ returns the payload size of the first
packet. Since commit e6afc8ac pulled the headers, the result is
incorrect when subtracting header length. Remove that operation.

Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f9a45be

08 4月, 2016 1 次提交

udp: Add udp6_lib_lookup_skb and udp4_lib_lookup_skb · 63058308

由 Tom Herbert 提交于 4月 05, 2016

Add externally visible functions to lookup a UDP socket by skb. This
will be used for GRO in UDP sockets. These functions also check
if skb->dst is set, and if it is not skb->dev is used to get dev_net.
This allows calling lookup functions before dst has been set on the
skbuff.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63058308

06 4月, 2016 2 次提交

udp: enable MSG_PEEK at non-zero offset · 627d2d6b

由 samanthakumar 提交于 4月 05, 2016

Enable peeking at UDP datagrams at the offset specified with socket
option SOL_SOCKET/SO_PEEK_OFF. Peek at any datagram in the queue, up
to the end of the given datagram.

Implement the SO_PEEK_OFF semantics introduced in commit ef64a54f
("sock: Introduce the SO_PEEK_OFF sock option"). Increase the offset
on peek, decrease it on regular reads.

When peeking, always checksum the packet immediately, to avoid
recomputation on subsequent peeks and final read.

The socket lock is not held for the duration of udp_recvmsg, so
peek and read operations can run concurrently. Only the last store
to sk_peek_off is preserved.
Signed-off-by: NSam Kumar <samanthakumar@google.com>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

627d2d6b

udp: remove headers from UDP packets before queueing · e6afc8ac

由 samanthakumar 提交于 4月 05, 2016

Remove UDP transport headers before queueing packets for reception.
This change simplifies a follow-up patch to add MSG_PEEK support.
Signed-off-by: NSam Kumar <samanthakumar@google.com>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6afc8ac

05 4月, 2016 3 次提交

udp: no longer use SLAB_DESTROY_BY_RCU · ca065d0c

由 Eric Dumazet 提交于 4月 01, 2016

Tom Herbert would like not touching UDP socket refcnt for encapsulated
traffic. For this to happen, we need to use normal RCU rules, with a grace
period before freeing a socket. UDP sockets are not short lived in the
high usage case, so the added cost of call_rcu() should not be a concern.

This actually removes a lot of complexity in UDP stack.

Multicast receives no longer need to hold a bucket spinlock.

Note that ip early demux still needs to take a reference on the socket.

Same remark for functions used by xt_socket and xt_PROXY netfilter modules,
but this might be changed later.

Performance for a single UDP socket receiving flood traffic from
many RX queues/cpus.

Simple udp_rx using simple recvfrom() loop :
438 kpps instead of 374 kpps : 17 % increase of the peak rate.

v2: Addressed Willem de Bruijn feedback in multicast handling
 - keep early demux break in __udp4_lib_demux_lookup()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Willem de Bruijn <willemb@google.com>
Tested-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca065d0c

sock: enable timestamping using control messages · c14ac945

由 Soheil Hassas Yeganeh 提交于 4月 02, 2016

Currently, SOL_TIMESTAMPING can only be enabled using setsockopt.
This is very costly when users want to sample writes to gather
tx timestamps.

Add support for enabling SO_TIMESTAMPING via control messages by
using tsflags added in `struct sockcm_cookie` (added in the previous
patches in this series) to set the tx_flags of the last skb created in
a sendmsg. With this patch, the timestamp recording bits in tx_flags
of the skbuff is overridden if SO_TIMESTAMPING is passed in a cmsg.

Please note that this is only effective for overriding the recording
timestamps flags. Users should enable timestamp reporting (e.g.,
SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_OPT_ID) using
socket options and then should ask for SOF_TIMESTAMPING_TX_*
using control messages per sendmsg to sample timestamps for each
write.
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c14ac945

ipv4: process socket-level control messages in IPv4 · 24025c46

由 Soheil Hassas Yeganeh 提交于 4月 02, 2016

Process socket-level control messages by invoking
__sock_cmsg_send in ip_cmsg_send for control messages on
the SOL_SOCKET layer.

This makes sure whenever ip_cmsg_send is called in udp, icmp,
and raw, we also process socket-level control messages.

Note that this commit interprets new control messages that
were ignored before. As such, this commit does not change
the behavior of IPv4 control messages.
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24025c46

23 3月, 2016 1 次提交

ipv4: fix broadcast packets reception · ad0ea198

由 Paolo Abeni 提交于 3月 22, 2016

Currently, ingress ipv4 broadcast datagrams are dropped since,
in udp_v4_early_demux(), ip_check_mc_rcu() is invoked even on
bcast packets.

This patch addresses the issue, invoking ip_check_mc_rcu()
only for mcast packets.

Fixes: 6e540309 ("ipv4/udp: Verify multicast group is ours in upd_v4_early_demux()")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad0ea198

13 2月, 2016 1 次提交

ipv4: fix memory leaks in ip_cmsg_send() callers · 91948309

由 Eric Dumazet 提交于 2月 04, 2016

Dmitry reported memory leaks of IP options allocated in
ip_cmsg_send() when/if this function returns an error.

Callers are responsible for the freeing.

Many thanks to Dmitry for the report and diagnostic.
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91948309

12 2月, 2016 2 次提交

net: udp: always set up for CHECKSUM_PARTIAL offload · d75f1306

由 Edward Cree 提交于 2月 11, 2016

If the dst device doesn't support it, it'll get fixed up later anyway
 by validate_xmit_skb().  Also, this allows us to take advantage of LCO
 to avoid summing the payload multiple times.
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d75f1306

net: local checksum offload for encapsulation · 179bc67f

由 Edward Cree 提交于 2月 11, 2016

The arithmetic properties of the ones-complement checksum mean that a
correctly checksummed inner packet, including its checksum, has a ones
complement sum depending only on whatever value was used to initialise
the checksum field before checksumming (in the case of TCP and UDP,
this is the ones complement sum of the pseudo header, complemented).
Consequently, if we are going to offload the inner checksum with
CHECKSUM_PARTIAL, we can compute the outer checksum based only on the
packed data not covered by the inner checksum, and the initial value of
the inner checksum field.
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

179bc67f

11 2月, 2016 1 次提交

soreuseport: fast reuseport TCP socket selection · c125e80b

由 Craig Gallek 提交于 2月 10, 2016

This change extends the fast SO_REUSEPORT socket lookup implemented
for UDP to TCP.  Listener sockets with SO_REUSEPORT and the same
receive address are additionally added to an array for faster
random access.  This means that only a single socket from the group
must be found in the listener list before any socket in the group can
be used to receive a packet.  Previously, every socket in the group
needed to be considered before handing off the incoming packet.

This feature also exposes the ability to use a BPF program when
selecting a socket from a reuseport group.
Signed-off-by: NCraig Gallek <kraig@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c125e80b

20 1月, 2016 1 次提交

udp: fix potential infinite loop in SO_REUSEPORT logic · ed0dfffd

由 Eric Dumazet 提交于 1月 19, 2016

Using a combination of connected and un-connected sockets, Dmitry
was able to trigger soft lockups with his fuzzer.

The problem is that sockets in the SO_REUSEPORT array might have
different scores.

Right after sk2=socket(), setsockopt(sk2,...,SO_REUSEPORT, on) and
bind(sk2, ...), but _before_ the connect(sk2) is done, sk2 is added into
the soreuseport array, with a score which is smaller than the score of
first socket sk1 found in hash table (I am speaking of the regular UDP
hash table), if sk1 had the connect() done, giving a +8 to its score.

hash bucket [X] -> sk1 -> sk2 -> NULL

sk1 score = 14  (because it did a connect())
sk2 score = 6

SO_REUSEPORT fast selection is an optimization. If it turns out the
score of the selected socket does not match score of first socket, just
fallback to old SO_REUSEPORT logic instead of trying to be too smart.

Normal SO_REUSEPORT users do not mix different kind of sockets, as this
mechanism is used for load balance traffic.

Fixes: e32ea7e7 ("soreuseport: fast reuseport UDP socket selection")
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Craig Gallek <kraigatgoog@gmail.com>
Acked-by: NCraig Gallek <kraig@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ed0dfffd

06 1月, 2016 1 次提交

soreuseport: pass skb to secondary UDP socket lookup · 1134158b

由 Craig Gallek 提交于 1月 05, 2016

This socket-lookup path did not pass along the skb in question
in my original BPF-based socket selection patch.  The skb in the
udpN_lib_lookup2 path can be used for BPF-based socket selection just
like it is in the 'traditional' udpN_lib_lookup path.

udpN_lib_lookup2 kicks in when there are greater than 10 sockets in
the same hlist slot.  Coincidentally, I chose 10 sockets per
reuseport group in my functional test, so the lookup2 path was not
excersised. This adds an additional set of tests with 20 sockets.

Fixes: 538950a1 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
Fixes: 3ca8e402 ("soreuseport: BPF selection functional test")
Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NCraig Gallek <kraig@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1134158b

05 1月, 2016 4 次提交

net: Propagate lookup failure in l3mdev_get_saddr to caller · b5bdacf3

由 David Ahern 提交于 1月 04, 2016

Commands run in a vrf context are not failing as expected on a route lookup:
    root@kenny:~# ip ro ls table vrf-red
    unreachable default

    root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254
    ping: Warning: source address might be selected on device other than vrf-red.
    PING 10.100.1.254 (10.100.1.254) from 0.0.0.0 vrf-red: 56(84) bytes of data.

    --- 10.100.1.254 ping statistics ---
    2 packets transmitted, 0 received, 100% packet loss, time 999ms

Since the vrf table does not have a route for 10.100.1.254 the ping
should have failed. The saddr lookup causes a full VRF table lookup.
Propogating a lookup failure to the user allows the command to fail as
expected:

    root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254
    connect: No route to host
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5bdacf3

soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF · 538950a1

由 Craig Gallek 提交于 1月 04, 2016

Expose socket options for setting a classic or extended BPF program
for use when selecting sockets in an SO_REUSEPORT group.  These options
can be used on the first socket to belong to a group before bind or
on any socket in the group after bind.

This change includes refactoring of the existing sk_filter code to
allow reuse of the existing BPF filter validation checks.
Signed-off-by: NCraig Gallek <kraig@google.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

538950a1

soreuseport: fast reuseport UDP socket selection · e32ea7e7

由 Craig Gallek 提交于 1月 04, 2016

Include a struct sock_reuseport instance when a UDP socket binds to
a specific address for the first time with the reuseport flag set.
When selecting a socket for an incoming UDP packet, use the information
available in sock_reuseport if present.

This required adding an additional field to the UDP source address
equality function to differentiate between exact and wildcard matches.
The original use case allowed wildcard matches when checking for
existing port uses during bind.  The new use case of adding a socket
to a reuseport group requires exact address matching.

Performance test (using a machine with 2 CPU sockets and a total of
48 cores):  Create reuseport groups of varying size.  Use one socket
from this group per user thread (pinning each thread to a different
core) calling recvmmsg in a tight loop.  Record number of messages
received per second while saturating a 10G link.
  10 sockets: 18% increase (~2.8M -> 3.3M pkts/s)
  20 sockets: 14% increase (~2.9M -> 3.3M pkts/s)
  40 sockets: 13% increase (~3.0M -> 3.4M pkts/s)

This work is based off a similar implementation written by
Ying Cai <ycai@google.com> for implementing policy-based reuseport
selection.
Signed-off-by: NCraig Gallek <kraig@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e32ea7e7

udp: properly support MSG_PEEK with truncated buffers · 197c949e

由 Eric Dumazet 提交于 12月 30, 2015

Backport of this upstream commit into stable kernels :
89c22d8c ("net: Fix skb csum races when peeking")
exposed a bug in udp stack vs MSG_PEEK support, when user provides
a buffer smaller than skb payload.

In this case,
skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr),
                                 msg->msg_iov);
returns -EFAULT.

This bug does not happen in upstream kernels since Al Viro did a great
job to replace this into :
skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg);
This variant is safe vs short buffers.

For the time being, instead reverting Herbert Xu patch and add back
skb->ip_summed invalid changes, simply store the result of
udp_lib_checksum_complete() so that we avoid computing the checksum a
second time, and avoid the problematic
skb_copy_and_csum_datagram_iovec() call.

This patch can be applied on recent kernels as it avoids a double
checksumming, then backported to stable kernels as a bug fix.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

197c949e

16 12月, 2015 1 次提交

net: Eliminate NETIF_F_GEN_CSUM and NETIF_F_V[46]_CSUM · c8cd0989

由 Tom Herbert 提交于 12月 14, 2015

These netif flags are unnecessary convolutions. It is more
straightforward to just use NETIF_F_HW_CSUM, NETIF_F_IP_CSUM,
and NETIF_F_IPV6_CSUM directly.

This patch also:
    - Cleans up can_checksum_protocol
    - Simplifies netdev_intersect_features
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8cd0989

19 11月, 2015 1 次提交

udp: remove duplicate include · 945fae44

由 stephen hemminger 提交于 11月 17, 2015

Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

945fae44

13 10月, 2015 1 次提交

net: SO_INCOMING_CPU setsockopt() support · 70da268b

由 Eric Dumazet 提交于 10月 08, 2015

SO_INCOMING_CPU as added in commit 2c8c56e1 was a getsockopt() command
to fetch incoming cpu handling a particular TCP flow after accept()

This commits adds setsockopt() support and extends SO_REUSEPORT selection
logic : If a TCP listener or UDP socket has this option set, a packet is
delivered to this socket only if CPU handling the packet matches the specified
one.

This allows to build very efficient TCP servers, using one listener per
RX queue, as the associated TCP listener should only accept flows handled
in softirq by the same cpu.
This provides optimal NUMA behavior and keep cpu caches hot.

Note that __inet_lookup_listener() still has to iterate over the list of
all listeners. Following patch puts sk_refcnt in a different cache line
to let this iteration hit only shared and read mostly cache lines.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70da268b

07 10月, 2015 2 次提交

net: Add source address lookup op for VRF · 8cbb512c

由 David Ahern 提交于 10月 05, 2015

Add operation to l3mdev to lookup source address for a given flow.
Add support for the operation to VRF driver and convert existing
IPv4 hooks to use the new lookup.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8cbb512c

net: Rename FLOWI_FLAG_VRFSRC to FLOWI_FLAG_L3MDEV_SRC · 6e2895a8

由 David Ahern 提交于 10月 05, 2015

Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e2895a8

30 9月, 2015 1 次提交

net: Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTER · 007979ea

由 David Ahern 提交于 9月 29, 2015

Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTER and update the name of the
netif_is_vrf and netif_index_is_vrf macros.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

007979ea

18 9月, 2015 1 次提交

net: Fix vti use case with oif in dst lookups · 58189ca7

由 David Ahern 提交于 9月 15, 2015

Steffen reported that the recent change to add oif to dst lookups breaks
the VTI use case. The problem is that with the oif set in the flow struct
the comparison to the nh_oif is triggered. Fix by splitting the
FLOWI_FLAG_VRFSRC into 2 flags -- one that triggers the vrf device cache
bypass (FLOWI_FLAG_VRFSRC) and another telling the lookup to not compare
nh oif (FLOWI_FLAG_SKIP_NH_OIF).

Fixes: 42a7b32b ("xfrm: Add oif to dst lookups")
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58189ca7

14 8月, 2015 1 次提交

udp: Handle VRF device in sendmsg · 9a24abfa

由 David Ahern 提交于 8月 13, 2015

For unconnected UDP sockets using a VRF device lookup source address
based on VRF table. This allows the UDP header to be properly setup
before showing up at the VRF device via the dst.
Signed-off-by: NShrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a24abfa

04 8月, 2015 1 次提交

udp: fix dst races with multicast early demux · 10e2eb87

由 Eric Dumazet 提交于 8月 01, 2015

Multicast dst are not cached. They carry DST_NOCACHE.

As mentioned in commit f8864972 ("ipv4: fix dst race in
sk_dst_get()"), these dst need special care before caching them
into a socket.

Caching them is allowed only if their refcnt was not 0, ie we
must use atomic_inc_not_zero()

Also, we must use READ_ONCE() to fetch sk->sk_rx_dst, as mentioned
in commit d0c294c5 ("tcp: prevent fetching dst twice in early demux
code")

Fixes: 421b3885 ("udp: ipv4: Add udp early demux")
Tested-by: NGregory Hoggarth <Gregory.Hoggarth@alliedtelesis.co.nz>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NGregory Hoggarth <Gregory.Hoggarth@alliedtelesis.co.nz>
Reported-by: NAlex Gartrell <agartrell@fb.com>
Cc: Michal Kubeček <mkubecek@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10e2eb87

04 6月, 2015 1 次提交

ipv4/udp: Verify multicast group is ours in upd_v4_early_demux() · 6e540309

由 Shawn Bohrer 提交于 6月 03, 2015

421b3885 "udp: ipv4: Add udp early
demux" introduced a regression that allowed sockets bound to INADDR_ANY
to receive packets from multicast groups that the socket had not joined.
For example a socket that had joined 224.168.2.9 could also receive
packets from 225.168.2.9 despite not having joined that group if
ip_early_demux is enabled.

Fix this by calling ip_check_mc_rcu() in udp_v4_early_demux() to verify
that the multicast packet is indeed ours.
Signed-off-by: NShawn Bohrer <sbohrer@rgmadvisors.com>
Reported-by: NYurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e540309

01 6月, 2015 1 次提交

udp: fix behavior of wrong checksums · beb39db5

由 Eric Dumazet 提交于 5月 30, 2015

We have two problems in UDP stack related to bogus checksums :

1) We return -EAGAIN to application even if receive queue is not empty.
   This breaks applications using edge trigger epoll()

2) Under UDP flood, we can loop forever without yielding to other
   processes, potentially hanging the host, especially on non SMP.

This patch is an attempt to make things better.

We might in the future add extra support for rt applications
wanting to better control time spent doing a recv() in a hostile
environment. For example we could validate checksums before queuing
packets in socket receive queue.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

beb39db5

08 4月, 2015 1 次提交

net: remove extra newlines · 8bc0034c

由 Sheng Yong 提交于 4月 08, 2015

Signed-off-by: NSheng Yong <shengyong1@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8bc0034c

04 4月, 2015 2 次提交

ipv4: coding style: comparison for inequality with NULL · 00db4124

由 Ian Morris 提交于 4月 03, 2015

The ipv4 code uses a mixture of coding styles. In some instances check
for non-NULL pointer is done as x != NULL and sometimes as x. x is
preferred according to checkpatch and this patch makes the code
consistent by adopting the latter form.

No changes detected by objdiff.
Signed-off-by: NIan Morris <ipm@chirality.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00db4124

ipv4: coding style: comparison for equality with NULL · 51456b29

由 Ian Morris 提交于 4月 03, 2015

The ipv4 code uses a mixture of coding styles. In some instances check
for NULL pointer is done as x == NULL and sometimes as !x. !x is
preferred according to checkpatch and this patch makes the code
consistent by adopting the latter form.

No changes detected by objdiff.
Signed-off-by: NIan Morris <ipm@chirality.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51456b29

19 3月, 2015 1 次提交

netns: constify net_hash_mix() and various callers · 6eada011

由 Eric Dumazet 提交于 3月 18, 2015

const qualifiers ease code review by making clear
which objects are not written in a function.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6eada011

03 3月, 2015 1 次提交

net: Remove iocb argument from sendmsg and recvmsg · 1b784140

由 Ying Xue 提交于 3月 02, 2015

After TIPC doesn't depend on iocb argument in its internal
implementations of sendmsg() and recvmsg() hooks defined in proto
structure, no any user is using iocb argument in them at all now.
Then we can drop the redundant iocb argument completely from kinds of
implementations of both sendmsg() and recvmsg() in the entire
networking stack.

Cc: Christoph Hellwig <hch@lst.de>
Suggested-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b784140