提交 · b6e4038262bc933f2ef5427b6bcd2607d02ba4bb · openeuler / raspberrypi-kernel

15 3月, 2016 2 次提交

tipc: make sure IPv6 header fits in skb headroom · 9bd160bf

由 Richard Alpe 提交于 3月 14, 2016

Expand headroom further in order to be able to fit the larger IPv6
header. Prior to this patch this caused a skb under panic for certain
tipc packets when using IPv6 UDP bearer(s).
Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com>
Acked-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bd160bf

net: add a hardware buffer management helper API · 8cb2d8bf

由 Gregory CLEMENT 提交于 3月 14, 2016

This basic implementation allows to share code between driver using
hardware buffer management. As the code is hardware agnostic, there is
few helpers, most of the optimization brought by the an HW BM has to be
done at driver level.
Tested-by: NSebastian Careba <nitroshift@yahoo.com>
Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8cb2d8bf

14 3月, 2016 11 次提交

GSO/UDP: Use skb->len instead of udph->len to determine length of original skb · 08334824

由 Alexander Duyck 提交于 3月 11, 2016

It is possible for tunnels to end up generating IP or IPv6 datagrams that
are larger than 64K and expecting to be segmented. As such we need to deal
with length values greater than 64K. In order to accommodate this we need
to update the code to work with a 32b length value instead of a 16b one.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

08334824

ipv6: Pass proto to csum_ipv6_magic as __u8 instead of unsigned short · 1e940829

由 Alexander Duyck 提交于 3月 11, 2016

This patch updates csum_ipv6_magic so that it correctly recognizes that
protocol is a unsigned 8 bit value.

This will allow us to better understand what limitations may or may not be
present in how we handle the data.  For example there are a number of
places that call htonl on the protocol value.  This is likely not necessary
and can be replaced with a multiplication by ntohl(1) which will be
converted to a shift by the compiler.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e940829

ipv4: Don't do expensive useless work during inetdev destroy. · fbd40ea0

由 David S. Miller 提交于 3月 13, 2016

When an inetdev is destroyed, every address assigned to the interface
is removed.  And in this scenerio we do two pointless things which can
be very expensive if the number of assigned interfaces is large:

1) Address promotion.  We are deleting all addresses, so there is no
   point in doing this.

2) A full nf conntrack table purge for every address.  We only need to
   do this once, as is already caught by the existing
   masq_dev_notifier so masq_inet_event() can skip this.
Reported-by: NSolar Designer <solar@openwall.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Tested-by: NCyrill Gorcunov <gorcunov@openvz.org>

fbd40ea0

net: socket: use pr_info_once to tip the obsolete usage of PF_PACKET · f3c98690

由 liping.zhang 提交于 3月 11, 2016

There is no need to use the static variable here, pr_info_once is more
concise.
Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3c98690

net: adjust napi_consume_skb to handle non-NAPI callers · 885eb0a5

由 Jesper Dangaard Brouer 提交于 3月 11, 2016

Some drivers reuse/share code paths that free SKBs between NAPI
and non-NAPI calls. Adjust napi_consume_skb to handle this
use-case.

Before, calls from netpoll (w/ IRQs disabled) was handled and
indicated with a budget zero indication.  Use the same zero
indication to handle calls not originating from NAPI/softirq.
Simply handled by using dev_consume_skb_any().

This adds an extra branch+call for the netpoll case (checking
in_irq() + irqs_disabled()), but that is okay as this is a slowpath.
Suggested-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

885eb0a5

sctp: allow sctp_transmit_packet and others to use gfp · cea8768f

由 Marcelo Ricardo Leitner 提交于 3月 10, 2016

Currently sctp_sendmsg() triggers some calls that will allocate memory
with GFP_ATOMIC even when not necessary. In the case of
sctp_packet_transmit it will allocate a linear skb that will be used to
construct the packet and this may cause sends to fail due to ENOMEM more
often than anticipated specially with big MTUs.

This patch thus allows it to inherit gfp flags from upper calls so that
it can use GFP_KERNEL if it was triggered by a sctp_sendmsg call or
similar. All others, like retransmits or flushes started from BH, are
still allocated using GFP_ATOMIC.

In netperf tests this didn't result in any performance drawbacks when
memory is not too fragmented and made it trigger ENOMEM way less often.
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cea8768f

ovs: allow nl 'flow set' to use ufid without flow key · 6f15cdbf

由 Samuel Gauthier 提交于 3月 10, 2016

When we want to change a flow using netlink, we have to identify it to
be able to perform a lookup. Both the flow key and unique flow ID
(ufid) are valid identifiers, but we always have to specify the flow
key in the netlink message. When both attributes are there, the ufid
is used. The flow key is used to validate the actions provided by
the userland.

This commit allows to use the ufid without having to provide the flow
key, as it is already done in the netlink 'flow get' and 'flow del'
path. The flow key remains mandatory when an action is provided.
Signed-off-by: NSamuel Gauthier <samuel.gauthier@6wind.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f15cdbf

netconf: add macro to represent all attributes · 136ba622

由 Zhang Shengju 提交于 3月 10, 2016

This patch adds macro NETCONFA_ALL to represent all type of netconf
attributes for IPv4 and IPv6.
Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

136ba622

sctp: fix the transports round robin issue when init is retransmitted · 39d2adeb

由 Xin Long 提交于 3月 10, 2016

prior to this patch, at the beginning if we have two paths in one assoc,
they may have the same params other than the last_time_heard, it will try
the paths like this:

1st cycle
  try trans1 fail.
  then trans2 is selected.(cause it's last_time_heard is after trans1).

2nd cycle:
  try  trans2 fail
  then trans2 is selected.(cause it's last_time_heard is after trans1).

3rd cycle:
  try  trans2 fail
  then trans2 is selected.(cause it's last_time_heard is after trans1).

....

trans1 will never have change to be selected, which is not what we expect.
we should keeping round robin all the paths if they are just added at the
beginning.

So at first every tranport's last_time_heard should be initialized 0, so
that we ensure they have the same value at the beginning, only by this,
all the transports could get equal chance to be selected.

Then for sctp_trans_elect_best, it should return the trans_next one when
*trans == *trans_next, so that we can try next if it fails,  but now it
always return trans. so we can fix it by exchanging these two params when
we calls sctp_trans_elect_tie().

Fixes: 4c47af4d ('net: sctp: rework multihoming retransmission path selection to rfc4960')
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

39d2adeb

rxrpc: Replace all unsigned with unsigned int · dad8aff7

由 David Howells 提交于 3月 09, 2016

Replace all "unsigned" types with "unsigned int" types.
Reported-by: NDavid Miller <davem@davemloft.net>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dad8aff7

gro: Defer clearing of flush bit in tunnel paths · c194cf93

由 Alexander Duyck 提交于 3月 09, 2016

This patch updates the GRO handlers for GRE, VXLAN, GENEVE, and FOU so that
we do not clear the flush bit until after we have called the next level GRO
handler. Previously this was being cleared before parsing through the list
of frames, however this resulted in several paths where either the bit
needed to be reset but wasn't as in the case of FOU, or cases where it was
being set as in GENEVE. By just deferring the clearing of the bit until
after the next level protocol has been parsed we can avoid any unnecessary
bit twiddling and avoid bugs.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c194cf93

12 3月, 2016 4 次提交

bpf: support flow label for bpf_skb_{set, get}_tunnel_key · 4018ab18

由 Daniel Borkmann 提交于 3月 09, 2016

This patch extends bpf_tunnel_key with a tunnel_label member, that maps
to ip_tunnel_key's label so underlying backends like vxlan and geneve
can propagate the label to udp_tunnel6_xmit_skb(), where it's being set
in the IPv6 header. It allows for having 20 more bits to encode/decode
flow related meta information programmatically. Tested with vxlan and
geneve.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4018ab18

ip_tunnel: add support for setting flow label via collect metadata · 13461144

由 Daniel Borkmann 提交于 3月 09, 2016

This patch extends udp_tunnel6_xmit_skb() to pass in the IPv6 flow label
from call sites. Currently, there's no such option and it's always set to
zero when writing ip6_flow_hdr(). Add a label member to ip_tunnel_key, so
that flow-based tunnels via collect metadata frontends can make use of it.
vxlan and geneve will be converted to add flow label support separately.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13461144

bridge: allow zero ageing time · 4c656c13

由 Stephen Hemminger 提交于 3月 08, 2016

This fixes a regression in the bridge ageing time caused by:
commit c62987bb ("bridge: push bridge setting ageing_time down to switchdev")

There are users of Linux bridge which use the feature that if ageing time
is set to 0 it causes entries to never expire. See:
  https://www.linuxfoundation.org/collaborate/workgroups/networking/bridge

For a pure software bridge, it is unnecessary for the code to have
arbitrary restrictions on what values are allowable.
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c656c13

net/flower: Fix pointer cast · 8208d21b

由 Amir Vadai 提交于 3月 11, 2016

Cast pointer to unsigned long instead of u64, to fix compilation warning
on 32 bit arch, spotted by 0day build.

Fixes: 5b33f488 ("net/flower: Introduce hardware offload support")
Signed-off-by: NAmir Vadai <amir@vadai.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8208d21b

11 3月, 2016 3 次提交

net/flow_dissector: Make dissector_uses_key() and skb_flow_dissector_target() public · 8de2d793

由 Amir Vadai 提交于 3月 08, 2016

Will be used in a following patch to query if a key is being used, and
what it's value in the target object.
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NAmir Vadai <amir@vadai.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8de2d793

net/flower: Introduce hardware offload support · 5b33f488

由 Amir Vadai 提交于 3月 08, 2016

This patch is based on a patch made by John Fastabend.
It adds support for offloading cls_flower.
when NETIF_F_HW_TC is on:
  flags = 0       => Rule will be processed twice - by hardware, and if
                     still relevant, by software.
  flags = SKIP_HW => Rull will be processed by software only

If hardware fail/not capabale to apply the rule, operation will NOT
fail. Filter will be processed by SW only.
Acked-by: NJiri Pirko <jiri@mellanox.com>
Suggested-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NAmir Vadai <amir@vadai.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b33f488

net: dsa: Fix cleanup resources upon module removal · 04761890

由 Neil Armstrong 提交于 3月 08, 2016

The initial commit badly merged into the dsa_resume method instead
of the dsa_remove_dst method.
As consequence, the dst->master_netdev->dsa_ptr is not set to NULL on
removal and re-bind of the dsa device fails with error -17.

Fixes: b0dc635d ("net: dsa: cleanup resources upon module removal ")
Signed-off-by: NNeil Armstrong <narmstrong@baylibre.com>
Acked-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04761890

10 3月, 2016 13 次提交

packet: validate variable length ll headers · 9ed988cd

由 Willem de Bruijn 提交于 3月 09, 2016

Replace link layer header validation check ll_header_truncate with
more generic dev_validate_header.

Validation based on hard_header_len incorrectly drops valid packets
in variable length protocols, such as AX25. dev_validate_header
calls header_ops.validate for such protocols to ensure correctness
below hard_header_len.

See also http://comments.gmane.org/gmane.linux.network/401064

Fixes 9c707762 ("packet: make packet_snd fail on len smaller than l2 header")
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ed988cd

ax25: add link layer header validation function · ea47781c

由 Willem de Bruijn 提交于 3月 09, 2016

As variable length protocol, AX25 fails link layer header validation
tests based on a minimum length. header_ops.validate allows protocols
to validate headers that are shorter than hard_header_len. Implement
this callback for AX25.

See also http://comments.gmane.org/gmane.linux.network/401064Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea47781c

kcm: Add receive message timeout · 29152a34

由 Tom Herbert 提交于 3月 07, 2016

This patch adds receive timeout for message assembly on the attached TCP
sockets. The timeout is set when a new messages is started and the whole
message has not been received by TCP (not in the receive queue). If the
completely message is subsequently received the timer is cancelled, if the
timer expires the RX side is aborted.

The timeout value is taken from the socket timeout (SO_RCVTIMEO) that is
set on a TCP socket (i.e. set by get sockopt before attaching a TCP socket
to KCM.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29152a34

kcm: Add memory limit for receive message construction · 7ced95ef

由 Tom Herbert 提交于 3月 07, 2016

Message assembly is performed on the TCP socket. This is logically
equivalent of an application that performs a peek on the socket to find
out how much memory is needed for a receive buffer. The receive socket
buffer also provides the maximum message size which is checked.

The receive algorithm is something like:

   1) Receive the first skbuf for a message (or skbufs if multiple are
      needed to determine message length).
   2) Check the message length against the number of bytes in the TCP
      receive queue (tcp_inq()).
	- If all the bytes of the message are in the queue (incluing the
	  skbuf received), then proceed with message assembly (it should
	  complete with the tcp_read_sock)
        - Else, mark the psock with the number of bytes needed to
	  complete the message.
   3) In TCP data ready function, if the psock indicates that we are
      waiting for the rest of the bytes of a messages, check the number
      of queued bytes against that.
        - If there are still not enough bytes for the message, just
	  return
        - Else, clear the waiting bytes and proceed to receive the
	  skbufs.  The message should now be received in one
	  tcp_read_sock
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7ced95ef

kcm: Sendpage support · f29698fc

由 Tom Herbert 提交于 3月 07, 2016

Implement kcm_sendpage. Set in sendpage to kcm_sendpage in both
dgram and seqpacket ops.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f29698fc

kcm: Splice support · 91687355

由 Tom Herbert 提交于 3月 07, 2016

Implement kcm_splice_read. This is supported only for seqpacket.
Add kcm_seqpacket_ops and set splice read to kcm_splice_read.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91687355

kcm: Add statistics and proc interfaces · cd6e111b

由 Tom Herbert 提交于 3月 07, 2016

This patch adds various counters for KCM. These include counters for
messages and bytes received or sent, as well as counters for number of
attached/unattached TCP sockets and other error or edge events.

The statistics are exposed via a proc interface. /proc/net/kcm provides
statistics per KCM socket and per psock (attached TCP sockets).
/proc/net/kcm_stats provides aggregate statistics.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd6e111b

kcm: Kernel Connection Multiplexor module · ab7ac4eb

由 Tom Herbert 提交于 3月 07, 2016

This module implements the Kernel Connection Multiplexor.

Kernel Connection Multiplexor (KCM) is a facility that provides a
message based interface over TCP for generic application protocols.
With KCM an application can efficiently send and receive application
protocol messages over TCP using datagram sockets.

For more information see the included Documentation/networking/kcm.txt
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab7ac4eb

tcp: Add tcp_inq to get available receive bytes on socket · 473bd239

由 Tom Herbert 提交于 3月 07, 2016

Create a common kernel function to get the number of bytes available
on a TCP socket. This is based on code in INQ getsockopt and we now call
the function for that getsockopt.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

473bd239

net: Walk fragments in __skb_splice_bits · fa9835e5

由 Tom Herbert 提交于 3月 07, 2016

Add walking of fragments in __skb_splice_bits.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa9835e5

net: Add MSG_BATCH flag · f092276d

由 Tom Herbert 提交于 3月 07, 2016

Add a new msg flag called MSG_BATCH. This flag is used in sendmsg to
indicate that more messages will follow (i.e. a batch of messages is
being sent). This is similar to MSG_MORE except that the following
messages are not merged into one packet, they are sent individually.
sendmmsg is updated so that each contained message except for the
last one is marked as MSG_BATCH.

MSG_BATCH is a performance optimization in cases where a socket
implementation can benefit by transmitting packets in a batch.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f092276d

net: Allow MSG_EOR in each msghdr of sendmmsg · 28a94d8f

由 Tom Herbert 提交于 3月 07, 2016

This patch allows setting MSG_EOR in each individual msghdr passed
in sendmmsg. This allows a sendmmsg to send multiple messages when
using SOCK_SEQPACKET.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

28a94d8f

net: Make sock_alloc exportable · f4a00aac

由 Tom Herbert 提交于 3月 07, 2016

Export it for cases where we want to create sockets by hand.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4a00aac

09 3月, 2016 7 次提交

ipv6: per netns FIB garbage collection · 3dc94f93

由 Michal Kubeček 提交于 3月 08, 2016

One of our customers observed issues with FIB6 garbage collectors
running in different network namespaces blocking each other, resulting
in soft lockups (fib6_run_gc() initiated from timer runs always in
forced mode).

Now that FIB6 walkers are separated per namespace, there is no more need
for instances of fib6_run_gc() in different namespaces blocking each
other. There is still a call to icmp6_dst_gc() which operates on shared
data but this function is protected by its own shared lock.
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3dc94f93

ipv6: per netns fib6 walkers · 9a03cd8f

由 Michal Kubeček 提交于 3月 08, 2016

The IPv6 FIB data structures are separated per network namespace but
there is still only one global walkers list and one global walker list
lock. This means changes in one namespace unnecessarily interfere with
walkers in other namespaces.

Replace the global list with per-netns lists (and give each its own
lock).
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a03cd8f

ipv6: replace global gc_args with local variable · 3570df91

由 Michal Kubeček 提交于 3月 08, 2016

Global variable gc_args is only used in fib6_run_gc() and functions
called from it. As fib6_run_gc() makes sure there is at most one
instance of fib6_clean_all() running at any moment, we can replace
gc_args with a local variable which will be needed once multiple
instances (per netns) of garbage collector are allowed.
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3570df91

net_sched: dsmark: use qdisc_dequeue_peeked() · f8b33d8e

由 Kyeong Yoo 提交于 3月 07, 2016

This fix is for dsmark similar to commit 3557619f
("net_sched: prio: use qdisc_dequeue_peeked")
and makes use of qdisc_dequeue_peeked() instead of direct dequeue() call.

First time, wrr peeks dsmark, which will then peek into sfq.
sfq dequeues an skb and it's stored in sch->gso_skb.
Next time, wrr tries to dequeue from dsmark, which will call sfq dequeue
directly. This results skipping the previously peeked skb.

So changed dsmark dequeue to call qdisc_dequeue_peeked() instead to use
peeked skb if exists.
Signed-off-by: NKyeong Yoo <kyeong.yoo@alliedtelesis.co.nz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8b33d8e

bpf, vxlan, geneve, gre: fix usage of dst_cache on xmit · db3c6139

由 Daniel Borkmann 提交于 3月 04, 2016

The assumptions from commit 0c1d70af ("net: use dst_cache for vxlan
device"), 468dfffc ("geneve: add dst caching support") and 3c1cb4d2
("net/ipv4: add dst cache support for gre lwtunnels") on dst_cache usage
when ip_tunnel_info is used is unfortunately not always valid as assumed.

While it seems correct for ip_tunnel_info front-ends such as OVS, eBPF
however can fill in ip_tunnel_info for consumers like vxlan, geneve or gre
with different remote dsts, tos, etc, therefore they cannot be assumed as
packet independent.

Right now vxlan, geneve, gre would cache the dst for eBPF and every packet
would reuse the same entry that was first created on the initial route
lookup. eBPF doesn't store/cache the ip_tunnel_info, so each skb may have
a different one.

Fix it by adding a flag that checks the ip_tunnel_info. Also the !tos test
in vxlan needs to be handeled differently in this context as it is currently
inferred from ip_tunnel_info as well if present. ip_tunnel_dst_cache_usable()
helper is added for the three tunnel cases, which checks if we can use dst
cache.

Fixes: 0c1d70af ("net: use dst_cache for vxlan device")
Fixes: 468dfffc ("geneve: add dst caching support")
Fixes: 3c1cb4d2 ("net/ipv4: add dst cache support for gre lwtunnels")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db3c6139

bpf: support for access to tunnel options · 14ca0751

由 Daniel Borkmann 提交于 3月 04, 2016

After eBPF being able to programmatically access/manage tunnel key meta
data via commit d3aa45ce ("bpf: add helpers to access tunnel metadata")
and more recently also for IPv6 through c6c33454 ("bpf: support ipv6
for bpf_skb_{set,get}_tunnel_key"), this work adds two complementary
helpers to generically access their auxiliary tunnel options.

Geneve and vxlan support this facility. For geneve, TLVs can be pushed,
and for the vxlan case its GBP extension. I.e. setting tunnel key for geneve
case only makes sense, if we can also read/write TLVs into it. In the GBP
case, it provides the flexibility to easily map the group policy ID in
combination with other helpers or maps.

I chose to model this as two separate helpers, bpf_skb_{set,get}_tunnel_opt(),
for a couple of reasons. bpf_skb_{set,get}_tunnel_key() is already rather
complex by itself, and there may be cases for tunnel key backends where
tunnel options are not always needed. If we would have integrated this
into bpf_skb_{set,get}_tunnel_key() nevertheless, we are very limited with
remaining helper arguments, so keeping compatibility on structs in case of
passing in a flat buffer gets more cumbersome. Separating both also allows
for more flexibility and future extensibility, f.e. options could be fed
directly from a map, etc.

Moreover, change geneve's xmit path to test only for info->options_len
instead of TUNNEL_GENEVE_OPT flag. This makes it more consistent with vxlan's
xmit path and allows for avoiding to specify a protocol flag in the API on
xmit, so it can be protocol agnostic. Having info->options_len is enough
information that is needed. Tested with vxlan and geneve.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

14ca0751

bpf: allow to propagate df in bpf_skb_set_tunnel_key · 22080870

由 Daniel Borkmann 提交于 3月 04, 2016

Added by 9a628224 ("ip_tunnel: Add dont fragment flag."), allow to
feed df flag into tunneling facilities (currently supported on TX by
vxlan, geneve and gre) as a hint from eBPF's bpf_skb_set_tunnel_key()
helper.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22080870