提交 · c5723ac51f2a1eaa69d831b919129a3c2b7efeee · openeuler / raspberrypi-kernel

12 8月, 2015 1 次提交

Revert "Merge branch 'mv88e6xxx-switchdev-fdb'" · cdf09697

由 David S. Miller 提交于 8月 11, 2015

This reverts commit f1d5ca43, reversing
changes made to 4933d85c.

I applied v2 instead of v3.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cdf09697

11 8月, 2015 4 次提交

gre: Remove support for sharing GRE protocol hook. · 9f57c67c

由 Pravin B Shelar 提交于 8月 07, 2015

Support for sharing GREPROTO_CISCO port was added so that
OVS gre port and kernel GRE devices can co-exist. After
flow-based tunneling patches OVS GRE protocol processing
is completely moved to ip_gre module. so there is no need
for GRE protocol hook. Following patch consolidates
GRE protocol related functions into ip_gre module.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f57c67c

openvswitch: Use regular GRE net_device instead of vport · b2acd1dc

由 Pravin B Shelar 提交于 8月 07, 2015

Using GRE tunnel meta data collection feature, we can implement
OVS GRE vport. This patch removes all of the OVS
specific GRE code and make OVS use a ip_gre net_device.
Minimal GRE vport is kept to handle compatibility with
current userspace application.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2acd1dc

ip_gre: Add support to collect tunnel metadata. · 2e15ea39

由 Pravin B Shelar 提交于 8月 07, 2015

Following patch create new tunnel flag which enable
tunnel metadata collection on given device.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e15ea39

net: add explicit logging and stat for neighbour table overflow · fb811395

由 Rick Jones 提交于 8月 07, 2015

Add an explicit neighbour table overflow message (ratelimited) and
statistic to make diagnosing neighbour table overflows tractable in
the wild.

Diagnosing a neighbour table overflow can be quite difficult in the wild
because there is no explicit dmesg logged.  Callers to neighbour code
seem to use net_dbg_ratelimit when the neighbour call fails which means
the "base message" is not emitted and the callback suppressed messages
from the ratelimiting can end-up juxtaposed with unrelated messages.
Further, a forced garbage collection will increment a stat on each call
whether it was successful in freeing-up a table entry or not, so that
statistic is only a hint.  So, add a net_info_ratelimited message and
explicit statistic to the neighbour code.
Signed-off-by: NRick Jones <rick.jones2@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb811395

10 8月, 2015 3 次提交

net: dsa: add support for switchdev FDB objects · 55045ddd

由 Vivien Didelot 提交于 8月 06, 2015

Remove the fdb_{add,del,getnext} function pointer in favor of new
port_fdb_{add,del,getnext}.

Implement the switchdev_port_obj_{add,del,dump} functions in DSA to
support the SWITCHDEV_OBJ_PORT_FDB objects.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

55045ddd

net: switchdev: support static FDB addresses · 89024826

由 Vivien Didelot 提交于 8月 06, 2015

This patch adds a is_static boolean to the switchdev_obj_fdb structure,
in order to set the ndm_state to either NUD_NOARP or NUD_REACHABLE.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89024826

net: switchdev: change fdb addr for a byte array · 1525c386

由 Vivien Didelot 提交于 8月 06, 2015

The address in the switchdev_obj_fdb structure is currently represented
as a pointer. Replacing it for a 6-byte array allows switchdev to carry
addresses directly read from hardware registers, not stored by the
switch chip driver (as in Rocker).
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1525c386

08 8月, 2015 1 次提交

vxlan: combine VXLAN_FLOWBASED into VXLAN_COLLECT_METADATA · da8b43c0

由 Alexei Starovoitov 提交于 8月 04, 2015

IFLA_VXLAN_FLOWBASED is useless without IFLA_VXLAN_COLLECT_METADATA,
so combine them into single IFLA_VXLAN_COLLECT_METADATA flag.
'flowbased' doesn't convey real meaning of the vxlan tunnel mode.
This mode can be used by routing, tc+bpf and ovs.
Only ovs is strictly flow based, so 'collect metadata' is a better
name for this tunnel mode.
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da8b43c0

01 8月, 2015 5 次提交

ipv6: Enable auto flow labels by default · b5677416

由 Tom Herbert 提交于 7月 31, 2015

Initialize auto_flowlabels to one. This enables automatic flow labels,
individual socket may disable them using the IPV6_AUTOFLOWLABEL socket
option.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5677416

ipv6: Implement different admin modes for automatic flow labels · 42240901

由 Tom Herbert 提交于 7月 31, 2015

Change the meaning of net.ipv6.auto_flowlabels to provide a mode for
automatic flow labels generation. There are four modes:

0: flow labels are disabled
1: flow labels are enabled, sockets can opt-out
2: flow labels are allowed, sockets can opt-in
3: flow labels are enabled and enforced, no opt-out for sockets

np->autoflowlabel is initialized according to the sysctl value.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42240901

ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel · 67800f9b

由 Tom Herbert 提交于 7月 31, 2015

We can't call skb_get_hash here since the packet is not complete to do
flow_dissector. Create hash based on flowi6 instead.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67800f9b

ipv6: change ipv6_stub_impl.ipv6_dst_lookup to take net argument · 343d60aa

由 Roopa Prabhu 提交于 7月 30, 2015

This patch adds net argument to ipv6_stub_impl.ipv6_dst_lookup
for use cases where sk is not available (like mpls).
sk appears to be needed to get the namespace 'net' and is optional
otherwise. This patch series changes ipv6_stub_impl.ipv6_dst_lookup
to take net argument. sk remains optional.

All callers of ipv6_stub_impl.ipv6_dst_lookup have been modified
to pass net. I have modified them to use already available
'net' in the scope of the call. I can change them to
sock_net(sk) to avoid any unintended change in behaviour if sock
namespace is different. They dont seem to be from code inspection.
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

343d60aa

bpf: add helpers to access tunnel metadata · d3aa45ce

由 Alexei Starovoitov 提交于 7月 30, 2015

Introduce helpers to let eBPF programs attached to TC manipulate tunnel metadata:
bpf_skb_[gs]et_tunnel_key(skb, key, size, flags)
skb: pointer to skb
key: pointer to 'struct bpf_tunnel_key'
size: size of 'struct bpf_tunnel_key'
flags: room for future extensions

First eBPF program that uses these helpers will allocate per_cpu
metadata_dst structures that will be used on TX.
On RX metadata_dst is allocated by tunnel driver.

Typical usage for TX:
struct bpf_tunnel_key tkey;
... populate tkey ...
bpf_skb_set_tunnel_key(skb, &tkey, sizeof(tkey), 0);
bpf_clone_redirect(skb, vxlan_dev_ifindex, 0);

RX:
struct bpf_tunnel_key tkey = {};
bpf_skb_get_tunnel_key(skb, &tkey, sizeof(tkey), 0);
... lookup or redirect based on tkey ...

'struct bpf_tunnel_key' will be extended in the future by adding
elements to the end and the 'size' argument will indicate which fields
are populated, thereby keeping backwards compatibility.
The 'flags' argument may be used as well when the 'size' is not enough or
to indicate completely different layout of bpf_tunnel_key.
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3aa45ce

31 7月, 2015 1 次提交

net: sched: fix refcount imbalance in actions · 28e6b67f

由 Daniel Borkmann 提交于 7月 29, 2015

Since commit 55334a5d ("net_sched: act: refuse to remove bound action
outside"), we end up with a wrong reference count for a tc action.

Test case 1:

  FOO="1,6 0 0 4294967295,"
  BAR="1,6 0 0 4294967294,"
  tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 \
     action bpf bytecode "$FOO"
  tc actions show action bpf
    action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
    index 1 ref 1 bind 1
  tc actions replace action bpf bytecode "$BAR" index 1
  tc actions show action bpf
    action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe
    index 1 ref 2 bind 1
  tc actions replace action bpf bytecode "$FOO" index 1
  tc actions show action bpf
    action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
    index 1 ref 3 bind 1

Test case 2:

  FOO="1,6 0 0 4294967295,"
  tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok
  tc actions show action gact
    action order 0: gact action pass
    random type none pass val 0
     index 1 ref 1 bind 1
  tc actions add action drop index 1
    RTNETLINK answers: File exists [...]
  tc actions show action gact
    action order 0: gact action pass
     random type none pass val 0
     index 1 ref 2 bind 1
  tc actions add action drop index 1
    RTNETLINK answers: File exists [...]
  tc actions show action gact
    action order 0: gact action pass
     random type none pass val 0
     index 1 ref 3 bind 1

What happens is that in tcf_hash_check(), we check tcf_common for a given
index and increase tcfc_refcnt and conditionally tcfc_bindcnt when we've
found an existing action. Now there are the following cases:

  1) We do a late binding of an action. In that case, we leave the
     tcfc_refcnt/tcfc_bindcnt increased and are done with the ->init()
     handler. This is correctly handeled.

  2) We replace the given action, or we try to add one without replacing
     and find out that the action at a specific index already exists
     (thus, we go out with error in that case).

In case of 2), we have to undo the reference count increase from
tcf_hash_check() in the tcf_hash_check() function. Currently, we fail to
do so because of the 'tcfc_bindcnt > 0' check which bails out early with
an -EPERM error.

Now, while commit 55334a5d prevents 'tc actions del action ...' on an
already classifier-bound action to drop the reference count (which could
then become negative, wrap around etc), this restriction only accounts for
invocations outside a specific action's ->init() handler.

One possible solution would be to add a flag thus we possibly trigger
the -EPERM ony in situations where it is indeed relevant.

After the patch, above test cases have correct reference count again.

Fixes: 55334a5d ("net_sched: act: refuse to remove bound action outside")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NCong Wang <cwang@twopensource.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

28e6b67f

30 7月, 2015 3 次提交

lwtunnel: Make lwtun_encaps[] static · 92a99bf3

由 Thomas Graf 提交于 7月 29, 2015

Any external user should use the registration API instead of
accessing this directly.

Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92a99bf3

net: Recompute sk_txhash on negative routing advice · 265f94ff

由 Tom Herbert 提交于 7月 28, 2015

When a connection is failing a transport protocol calls
dst_negative_advice to try to get a better route. This patch includes
changing the sk_txhash in that function. This provides a rudimentary
method to try to find a different path in the network since sk_txhash
affects ECMP on the local host and through the network (via flow labels
or UDP source port in encapsulation).
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

265f94ff

net: Set sk_txhash from a random number · 877d1f62

由 Tom Herbert 提交于 7月 28, 2015

This patch creates sk_set_txhash and eliminates protocol specific
inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a
random number instead of performing flow dissection. sk_set_txash
is also allowed to be called multiple times for the same socket,
we'll need this when redoing the hash for negative routing advice.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

877d1f62

27 7月, 2015 6 次提交

tcp: fix recv with flags MSG_WAITALL | MSG_PEEK · dfbafc99

由 Sabrina Dubroca 提交于 7月 24, 2015

Currently, tcp_recvmsg enters a busy loop in sk_wait_data if called
with flags = MSG_WAITALL | MSG_PEEK.

sk_wait_data waits for sk_receive_queue not empty, but in this case,
the receive queue is not empty, but does not contain any skb that we
can use.

Add a "last skb seen on receive queue" argument to sk_wait_data, so
that it sleeps until the receive queue has new skbs.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=99461
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=18493
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1205258Reported-by: NEnrico Scholz <rh-bugzilla@ensc.de>
Reported-by: NDan Searle <dan@censornet.com>
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dfbafc99

bonding: convert num_grat_arp to the new bonding option API · 205845a3

由 Nikolay Aleksandrov 提交于 7月 24, 2015

num_grat_arp wasn't converted to the new bonding option API, so do this
now and remove the specific sysfs store option in order to use the
standard one. num_grat_arp is the same as num_unsol_na so add it as an
alias with the same option settings. An important difference is the option
name which is matched in bond_sysfs_store_option().
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: NVeaceslav Falico <vfalico@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

205845a3

lwtunnel: change prototype of lwtunnel_state_get() · 5a6228a0

由 Nicolas Dichtel 提交于 7月 24, 2015

It saves some lines and simplify a bit the code when the state is returning
by this function. It's also useful to handle a NULL entry.

To avoid too long lines, I've also renamed lwtunnel_state_get() and
lwtunnel_state_put() to lwtstate_get() and lwtstate_put().

CC: Thomas Graf <tgraf@suug.ch>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a6228a0

inet: frags: remove INET_FRAG_EVICTED and use list_evictor for the test · caaecdd3

由 Nikolay Aleksandrov 提交于 7月 23, 2015

We can simply remove the INET_FRAG_EVICTED flag to avoid all the flags
race conditions with the evictor and use a participation test for the
evictor list, when we're at that point (after inet_frag_kill) in the
timer there're 2 possible cases:

1. The evictor added the entry to its evictor list while the timer was
waiting for the chainlock
or
2. The timer unchained the entry and the evictor won't see it

In both cases we should be able to see list_evictor correctly due
to the sync on the chainlock.

Joint work with Florian Westphal.
Tested-by: NFrank Schreuder <fschreuder@transip.nl>
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

caaecdd3

inet: frag: change *_frag_mem_limit functions to take netns_frags as argument · 0e60d245

由 Florian Westphal 提交于 7月 23, 2015

Followup patch will call it after inet_frag_queue was freed, so q->net
doesn't work anymore (but netf = q->net; free(q); mem_limit(netf) would).
Tested-by: NFrank Schreuder <fschreuder@transip.nl>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e60d245

inet: frag: don't re-use chainlist for evictor · d1fe1944

由 Florian Westphal 提交于 7月 23, 2015

commit 65ba1f1e ("inet: frags: fix a race between inet_evict_bucket
and inet_frag_kill") describes the bug, but the fix doesn't work reliably.

Problem is that ->flags member can be set on other cpu without chainlock
being held by that task, i.e. the RMW-Cycle can clear INET_FRAG_EVICTED
bit after we put the element on the evictor private list.

We can crash when walking the 'private' evictor list since an element can
be deleted from list underneath the evictor.

Join work with Nikolay Alexandrov.

Fixes: b13d3cbf ("inet: frag: move eviction of queues to work queue")
Reported-by: NJohan Schuijt <johan@transip.nl>
Tested-by: NFrank Schreuder <fschreuder@transip.nl>
Signed-off-by: NNikolay Alexandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1fe1944

25 7月, 2015 1 次提交

ipv4: consider TOS in fib_select_default · 2392debc

由 Julian Anastasov 提交于 7月 22, 2015

fib_select_default considers alternative routes only when
res->fi is for the first alias in res->fa_head. In the
common case this can happen only when the initial lookup
matches the first alias with highest TOS value. This
prevents the alternative routes to require specific TOS.

This patch solves the problem as follows:

- routes that require specific TOS should be returned by
fib_select_default only when TOS matches, as already done
in fib_table_lookup. This rule implies that depending on the
TOS we can have many different lists of alternative gateways
and we have to keep the last used gateway (fa_default) in first
alias for the TOS instead of using single tb_default value.

- as the aliases are ordered by many keys (TOS desc,
fib_priority asc), we restrict the possible results to
routes with matching TOS and lowest metric (fib_priority)
and routes that match any TOS, again with lowest metric.

For example, packet with TOS 8 can not use gw3 (not lowest
metric), gw4 (different TOS) and gw6 (not lowest metric),
all other gateways can be used:

tos 8 via gw1 metric 2 <--- res->fa_head and res->fi
tos 8 via gw2 metric 2
tos 8 via gw3 metric 3
tos 4 via gw4
tos 0 via gw5
tos 0 via gw6 metric 1
Reported-by: NHagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2392debc

23 7月, 2015 7 次提交

Bluetooth: hci_check_conn_params() check proper range · 8757825b

由 Seungyoun Ju 提交于 7月 13, 2015

Slave latency range has been changed in Core Spec. 4.2 by Erratum 5419
of ESR08_V1.0.0. And it should be applied to Core Spec. 4.0 and 4.1.

Before:
   connSlaveLatency <= ((connSupervisionTimeout / connIntervalMax) - 1)

After:
   connSlaveLatency <= ((connSupervisionTimeout / (connIntervalMax*2)) - 1)

This patch makes hci_check_conn_params() check the allowable slave
latency range using the changed way.
Signed-off-by: NSeungyoun Ju <sy39.ju@samsung.com>
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>

8757825b

Bluetooth: __l2cap_wait_ack() add defensive timeout · e432c72c

由 Dean Jenkins 提交于 6月 23, 2015

Add a timeout to prevent the do while loop running in an
infinite loop. This ensures that the channel will be
instructed to close within 10 seconds so prevents
l2cap_sock_shutdown() getting stuck forever.

Returns -ENOLINK when the timeout is reached. The channel
will be subequently closed and not all data will be ACK'ed.
Signed-off-by: NDean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

e432c72c

Bluetooth: __l2cap_wait_ack() use msecs_to_jiffies() · cb02a255

由 Dean Jenkins 提交于 6月 23, 2015

Use msecs_to_jiffies() instead of using HZ so that it
is easier to specify the time in milliseconds.

Also add a #define L2CAP_WAIT_ACK_POLL_PERIOD to specify the 200ms
polling period so that it is defined in a single place.
Signed-off-by: NDean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

cb02a255

mac802154: do not export ieee802154_rx() · 729a8989

由 Varka Bhadram 提交于 7月 07, 2015

Right now there are no other users for ieee802154_rx()
in kernel. So lets remove EXPORT_SYMBOL() for this.

Also it moves the function prototype from global header
file to local header file.
Signed-off-by: NVarka Bhadram <varkabhadram@gmail.com>
Acked-by: NAlexander Aring <alex.aring@gmail.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

729a8989

cfg802154: add PM hooks · a6cb869b

由 Varka Bhadram 提交于 6月 24, 2015

This patch help to implement suspend/resume in mac802154, these
hooks will be run before the device is suspended and after it
resumes.
Signed-off-by: NVarka Bhadram <varkab@cdac.in>
Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

a6cb869b

ip_tunnel: Call ip_tunnel_core_init() from inet_init() · 045a0fa0

由 Thomas Graf 提交于 7月 23, 2015

Convert the module_init() to a invocation from inet_init() since
ip_tunnel_core is part of the INET built-in.

Fixes: 3093fbe7 ("route: Per route IP tunnel metadata via lightweight tunnel")
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

045a0fa0

ip_tunnel: Provide tunnel metadata API for CONFIG_INET=n · 05283187

由 Thomas Graf 提交于 7月 22, 2015

Account for the configuration FIB_RULES=y && INET=n as FIB_RULES can
be selected by IPV6 or DECNET without INET.

Fixes: e7030878 ("fib: Add fib rule match on tunnel id")
Fixes: 3093fbe7 ("route: Per route IP tunnel metadata via lightweight tunnel")
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

05283187

22 7月, 2015 8 次提交

net: #ifdefify sk_classid member of struct sock · e181a543

由 Mathias Krause 提交于 7月 19, 2015

The sk_classid member is only required when CONFIG_CGROUP_NET_CLASSID is
enabled. #ifdefify it to reduce the size of struct sock on 32 bit
systems, at least.
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e181a543

openvswitch: Use regular VXLAN net_device device · 614732ea

由 Thomas Graf 提交于 7月 21, 2015

This gets rid of all OVS specific VXLAN code in the receive and
transmit path by using a VXLAN net_device to represent the vport.
Only a small shim layer remains which takes care of handling the
VXLAN specific OVS Netlink configuration.

Unexports vxlan_sock_add(), vxlan_sock_release(), vxlan_xmit_skb()
since they are no longer needed.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

614732ea

vxlan: Factor out device configuration · 0dfbdf41

由 Thomas Graf 提交于 7月 21, 2015

This factors out the device configuration out of the RTNL newlink
API which allows for in-kernel creation of VXLAN net_devices.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0dfbdf41

fib: Add fib rule match on tunnel id · e7030878

由 Thomas Graf 提交于 7月 21, 2015

This add the ability to select a routing table based on the tunnel
id which allows to maintain separate routing tables for each virtual
tunnel network.

ip rule add from all tunnel-id 100 lookup 100
ip rule add from all tunnel-id 200 lookup 200

A new static key controls the collection of metadata at tunnel level
upon demand.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7030878

route: Per route IP tunnel metadata via lightweight tunnel · 3093fbe7

由 Thomas Graf 提交于 7月 21, 2015

This introduces a new IP tunnel lightweight tunnel type which allows
to specify IP tunnel instructions per route. Only IPv4 is supported
at this point.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3093fbe7

route: Extend flow representation with tunnel key · 1b7179d3

由 Thomas Graf 提交于 7月 21, 2015

Add a new flowi_tunnel structure which is a subset of ip_tunnel_key to
allow routes to match on tunnel metadata. For now, the tunnel id is
added to flowi_tunnel which allows for routes to be bound to specific
virtual tunnels.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b7179d3

vxlan: Flow based tunneling · ee122c79

由 Thomas Graf 提交于 7月 21, 2015

Allows putting a VXLAN device into a new flow-based mode in which
skbs with a ip_tunnel_info dst metadata attached will be encapsulated
according to the instructions stored in there with the VXLAN device
defaults taken into consideration.

Similar on the receive side, if the VXLAN_F_COLLECT_METADATA flag is
set, the packet processing will populate a ip_tunnel_info struct for
each packet received and attach it to the skb using the new metadata
dst.  The metadata structure will contain the outer header and tunnel
header fields which have been stripped off. Layers further up in the
stack such as routing, tc or netfitler can later match on these fields
and perform forwarding. It is the responsibility of upper layers to
ensure that the flag is set if the metadata is needed. The flag limits
the additional cost of metadata collecting based on demand.

This prepares the VXLAN device to be steered by the routing and other
subsystems which allows to support encapsulation for a large number
of tunnel endpoints and tunnel ids through a single net_device which
improves the scalability.

It also allows for OVS to leverage this mode which in turn allows for
the removal of the OVS specific VXLAN code.

Because the skb is currently scrubed in vxlan_rcv(), the attachment of
the new dst metadata is postponed until after scrubing which requires
the temporary addition of a new member to vxlan_metadata. This member
is removed again in a later commit after the indirect VXLAN receive API
has been removed.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee122c79

dst: Metadata destinations · f38a9eb1

由 Thomas Graf 提交于 7月 21, 2015

Introduces a new dst_metadata which enables to carry per packet metadata
between forwarding and processing elements via the skb->dst pointer.

The structure is set up to be a union. Thus, each separate type of
metadata requires its own dst instance. If demand arises to carry
multiple types of metadata concurrently, metadata dst entries can be
made stackable.

The metadata dst entry is refcnt'ed as expected for now but a non
reference counted use is possible if the reference is forced before
queueing the skb.

In order to allow allocating dsts with variable length, the existing
dst_alloc() is split into a dst_alloc() and dst_init() function. The
existing dst_init() function to initialize the subsystem is being
renamed to dst_subsys_init() to make it clear what is what.

The check before ip_route_input() is changed to ignore metadata dsts
and drop the dst inside the routing function thus allowing to interpret
metadata in a later commit.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f38a9eb1