提交 · 777b3e930ac8eb1f8360b3e4f2aaf5e4abe5ed46 · openanolis / cloud-kernel

12 2月, 2015 3 次提交

vxlan: Use checksum partial with remote checksum offload · 0ace2ca8

由 Tom Herbert 提交于 2月 10, 2015

Change remote checksum handling to set checksum partial as default
behavior. Added an iflink parameter to configure not using
checksum partial (calling csum_partial to update checksum).
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ace2ca8

net: Infrastructure for CHECKSUM_PARTIAL with remote checsum offload · 15e2396d

由 Tom Herbert 提交于 2月 10, 2015

This patch adds infrastructure so that remote checksum offload can
set CHECKSUM_PARTIAL instead of calling csum_partial and writing
the modfied checksum field.

Add skb_remcsum_adjust_partial function to set an skb for using
CHECKSUM_PARTIAL with remote checksum offload.  Changed
skb_remcsum_process and skb_gro_remcsum_process to take a boolean
argument to indicate if checksum partial can be set or the
checksum needs to be modified using the normal algorithm.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

15e2396d

net: Fix remcsum in GRO path to not change packet · 26c4f7da

由 Tom Herbert 提交于 2月 10, 2015

Remote checksum offload processing is currently the same for both
the GRO and non-GRO path. When the remote checksum offload option
is encountered, the checksum field referred to is modified in
the packet. So in the GRO case, the packet is modified in the
GRO path and then the operation is skipped when the packet goes
through the normal path based on skb->remcsum_offload. There is
a problem in that the packet may be modified in the GRO path, but
then forwarded off host still containing the remote checksum option.
A remote host will again perform RCO but now the checksum verification
will fail since GRO RCO already modified the checksum.

To fix this, we ensure that GRO restores a packet to it's original
state before returning. In this model, when GRO processes a remote
checksum option it still changes the checksum per the algorithm
but on return from lower layer processing the checksum is restored
to its original value.

In this patch we add define gro_remcsum structure which is passed
to skb_gro_remcsum_process to save offset and delta for the checksum
being changed. After lower layer processing, skb_gro_remcsum_cleanup
is called to restore the checksum before returning from GRO.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26c4f7da

09 2月, 2015 1 次提交

vxlan: Wrong type passed to %pIS · a4870f79

由 Rasmus Villemoes 提交于 2月 07, 2015

src_ip is a pointer to a union vxlan_addr, one member of which is a
struct sockaddr. Passing a pointer to src_ip is wrong; one should pass
the value of src_ip itself. Since %pIS formally expects something of
type struct sockaddr*, let's pass a pointer to the appropriate union
member, though this of course doesn't change the generated code.

Fixes: e4c7ed41 ("vxlan: add ipv6 support")
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a4870f79

05 2月, 2015 2 次提交

vxlan: Only set has-GBP bit in header if any other bits would be set · db79a621

由 Thomas Graf 提交于 2月 04, 2015

This allows for a VXLAN-GBP socket to talk to a Linux VXLAN socket by
not setting any of the bits.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db79a621

net: add skb functions to process remote checksum offload · dcdc8994

由 Tom Herbert 提交于 2月 02, 2015

This patch adds skb_remcsum_process and skb_gro_remcsum_process to
perform the appropriate adjustments to the skb when receiving
remote checksum offload.

Updated vxlan and gue to use these functions.

Tested: Ran TCP_RR and TCP_STREAM netperf for VXLAN and GUE, did
not see any change in performance.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dcdc8994

30 1月, 2015 1 次提交

vxlan: setup the right link netns in newlink hdlr · 33564bbb

由 Nicolas Dichtel 提交于 1月 26, 2015

Rename the netns to src_net to avoid confusion with the netns where the
interface stands. The user may specify IFLA_NET_NS_[PID|FD] to create
a x-netns netndevice: IFLA_NET_NS_[PID|FD] points to the netns where the
netdevice stands and src_net to the link netns.

Note that before commit f01ec1c0 ("vxlan: add x-netns support"), it was
possible to create a x-netns vxlan netdevice, but the netdevice was not
operational.

Fixes: f01ec1c0 ("vxlan: add x-netns support")
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

33564bbb

28 1月, 2015 1 次提交

vxlan: advertise link netns in fdb messages · 4967082b

由 Nicolas Dichtel 提交于 1月 26, 2015

Previous commit is based on a wrong assumption, fdb messages are always sent
into the netns where the interface stands (see vxlan_fdb_notify()).

These fdb messages doesn't embed the rtnl attribute IFLA_LINK_NETNSID, thus we
need to add it (useful to interpret NDA_IFINDEX or NDA_DST for example).

Note also that vxlan_nlmsg_size() was not updated.

Fixes: 193523bf ("vxlan: advertise netns of vxlan dev in fdb msg")
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4967082b

25 1月, 2015 2 次提交

vxlan: Eliminate dependency on UDP socket in transmit path · af33c1ad

由 Tom Herbert 提交于 1月 20, 2015

In the vxlan transmit path there is no need to reference the socket
for a tunnel which is needed for the receive side. We do, however,
need the vxlan_dev flags. This patch eliminate references
to the socket in the transmit path, and changes VXLAN_F_UNSHAREABLE
to be VXLAN_F_RCV_FLAGS. This mask is used to store the flags
applicable to receive (GBP, CSUM6_RX, and REMCSUM_RX) in the
vxlan_sock flags.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af33c1ad

udp: Do not require sock in udp_tunnel_xmit_skb · d998f8ef

由 Tom Herbert 提交于 1月 20, 2015

The UDP tunnel transmit functions udp_tunnel_xmit_skb and
udp_tunnel6_xmit_skb include a socket argument. The socket being
passed to the functions (from VXLAN) is a UDP created for receive
side. The only thing that the socket is used for in the transmit
functions is to get the setting for checksum (enabled or zero).
This patch removes the argument and and adds a nocheck argument
for checksum setting. This eliminates the unnecessary dependency
on a UDP socket for UDP tunnel transmit.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d998f8ef

24 1月, 2015 1 次提交

vxlan: advertise netns of vxlan dev in fdb msg · 193523bf

由 Nicolas Dichtel 提交于 1月 20, 2015

Netlink FDB messages are sent in the link netns. The header of these messages
contains the ifindex (ndm_ifindex) of the netdevice, but this ifindex is
unusable in case of x-netns vxlan.
I named the new attribute NDA_NDM_IFINDEX_NETNSID, to avoid confusion with
NDA_IFINDEX.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

193523bf

20 1月, 2015 1 次提交

tunnels: advertise link netns via netlink · 1728d4fa

由 Nicolas Dichtel 提交于 1月 15, 2015

Implement rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1728d4fa

18 1月, 2015 1 次提交

netlink: make nlmsg_end() and genlmsg_end() void · 053c095a

由 Johannes Berg 提交于 1月 16, 2015

Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.

This makes the very common pattern of

  if (genlmsg_end(...) < 0) { ... }

be a whole bunch of dead code. Many places also simply do

  return nlmsg_end(...);

and the caller is expected to deal with it.

This also commonly (at least for me) causes errors, because it is very
common to write

  if (my_function(...))
    /* error condition */

and if my_function() does "return nlmsg_end()" this is of course wrong.

Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb->len there.

Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did

-	return nlmsg_end(...);
+	nlmsg_end(...);
+	return 0;

I could have preserved all the function's return values by returning
skb->len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with <= 0 in dump functionality, but that could just
be changed to < 0 with no change in behaviour, so I opted for the more
efficient version.

One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for <0 or <=0 and thus broke out of the loop every single time.
I've preserved this since it will (I think) have caused the messages to
userspace to be formatted differently with just a single message for
every SKB returned to userspace. It's possible that this isn't needed
for the tools that actually use this, but I don't even know what they
are so couldn't test that changing this behaviour would be acceptable.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

053c095a

15 1月, 2015 4 次提交

vxlan: Only bind to sockets with compatible flags enabled · ac5132d1

由 Thomas Graf 提交于 1月 15, 2015

A VXLAN net_device looking for an appropriate socket may only consider
a socket which has a matching set of flags/extensions enabled. If
incompatible flags are enabled, return a conflict to have the caller
create a distinct socket with distinct port.

The OVS VXLAN port is kept unaware of extensions at this point.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac5132d1

vxlan: Group Policy extension · 3511494c

由 Thomas Graf 提交于 1月 15, 2015

Implements supports for the Group Policy VXLAN extension [0] to provide
a lightweight and simple security label mechanism across network peers
based on VXLAN. The security context and associated metadata is mapped
to/from skb->mark. This allows further mapping to a SELinux context
using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
tc, etc.

The group membership is defined by the lower 16 bits of skb->mark, the
upper 16 bits are used for flags.

SELinux allows to manage label to secure local resources. However,
distributed applications require ACLs to implemented across hosts. This
is typically achieved by matching on L2-L4 fields to identify the
original sending host and process on the receiver. On top of that,
netlabel and specifically CIPSO [1] allow to map security contexts to
universal labels.  However, netlabel and CIPSO are relatively complex.
This patch provides a lightweight alternative for overlay network
environments with a trusted underlay. No additional control protocol
is required.

           Host 1:                       Host 2:

      Group A        Group B        Group B     Group A
      +-----+   +-------------+    +-------+   +-----+
      | lxc |   | SELinux CTX |    | httpd |   | VM  |
      +--+--+   +--+----------+    +---+---+   +--+--+
	  \---+---/                     \----+---/
	      |                              |
	  +---+---+                      +---+---+
	  | vxlan |                      | vxlan |
	  +---+---+                      +---+---+
	      +------------------------------+

Backwards compatibility:
A VXLAN-GBP socket can receive standard VXLAN frames and will assign
the default group 0x0000 to such frames. A Linux VXLAN socket will
drop VXLAN-GBP  frames. The extension is therefore disabled by default
and needs to be specifically enabled:

   ip link add [...] type vxlan [...] gbp

In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
must run on a separate port number.

Examples:
 iptables:
  host1# iptables -I OUTPUT -m owner --uid-owner 101 -j MARK --set-mark 0x200
  host2# iptables -I INPUT -m mark --mark 0x200 -j DROP

 OVS:
  # ovs-ofctl add-flow br0 'in_port=1,actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL'
  # ovs-ofctl add-flow br0 'in_port=2,tun_gbp_id=0x200,actions=drop'

[0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
[1] http://lwn.net/Articles/204905/Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3511494c

vxlan: Remote checksum offload · dfd8645e

由 Tom Herbert 提交于 1月 12, 2015

Add support for remote checksum offload in VXLAN. This uses a
reserved bit to indicate that RCO is being done, and uses the low order
reserved eight bits of the VNI to hold the start and offset values in a
compressed manner.

Start is encoded in the low order seven bits of VNI. This is start >> 1
so that the checksum start offset is 0-254 using even values only.
Checksum offset (transport checksum field) is indicated in the high
order bit in the low order byte of the VNI. If the bit is set, the
checksum field is for UDP (so offset = start + 6), else checksum
field is for TCP (so offset = start + 16). Only TCP and UDP are
supported in this implementation.

Remote checksum offload for VXLAN is described in:

https://tools.ietf.org/html/draft-herbert-vxlan-rco-00

Tested by running 200 TCP_STREAM connections with VXLAN (over IPv4).

With UDP checksums and Remote Checksum Offload
  IPv4
      Client
        11.84% CPU utilization
      Server
        12.96% CPU utilization
      9197 Mbps
  IPv6
      Client
        12.46% CPU utilization
      Server
        14.48% CPU utilization
      8963 Mbps

With UDP checksums, no remote checksum offload
  IPv4
      Client
        15.67% CPU utilization
      Server
        14.83% CPU utilization
      9094 Mbps
  IPv6
      Client
        16.21% CPU utilization
      Server
        14.32% CPU utilization
      9058 Mbps

No UDP checksums
  IPv4
      Client
        15.03% CPU utilization
      Server
        23.09% CPU utilization
      9089 Mbps
  IPv6
      Client
        16.18% CPU utilization
      Server
        26.57% CPU utilization
       8954 Mbps
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dfd8645e

udp: pass udp_offload struct to UDP gro callbacks · a2b12f3c

由 Tom Herbert 提交于 1月 12, 2015

This patch introduces udp_offload_callbacks which has the same
GRO functions (but not a GSO function) as offload_callbacks,
except there is an argument to a udp_offload struct passed to
gro_receive and gro_complete functions. This additional argument
can be used to retrieve the per port structure of the encapsulation
for use in gro processing (mostly by doing container_of on the
structure).
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2b12f3c

14 1月, 2015 1 次提交

net: rename vlan_tx_* helpers since "tx" is misleading there · df8a39de

由 Jiri Pirko 提交于 1月 13, 2015

The same macros are used for rx as well. So rename it.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df8a39de

13 1月, 2015 1 次提交

vxlan: Improve support for header flags · 3bf39475

由 Tom Herbert 提交于 1月 08, 2015

This patch cleans up the header flags of VXLAN in anticipation of
defining some new ones:

- Move header related definitions from vxlan.c to vxlan.h
- Change VXLAN_FLAGS to be VXLAN_HF_VNI (only currently defined flag)
- Move check for unknown flags to after we find vxlan_sock, this
  assumes that some flags may be processed based on tunnel
  configuration
- Add a comment about why the stack treating unknown set flags as an
  error instead of ignoring them
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3bf39475

03 1月, 2015 1 次提交

net: Add Transparent Ethernet Bridging GRO support. · 9b174d88

由 Jesse Gross 提交于 12月 30, 2014

Currently the only tunnel protocol that supports GRO with encapsulated
Ethernet is VXLAN. This pulls out the Ethernet code into a proper layer
so that it can be used by other tunnel protocols such as GRE and Geneve.
Signed-off-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b174d88

24 12月, 2014 1 次提交

vxlan: Fix double free of skb. · 74f47278

由 Pravin B Shelar 提交于 12月 23, 2014

In case of error vxlan_xmit_one() can free already freed skb.
Also fixes memory leak of dst-entry.

Fixes: acbf74a7 ("vxlan: Refactor vxlan driver to make use
of the common UDP tunnel functions").
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74f47278

12 12月, 2014 1 次提交

Fix race condition between vxlan_sock_add and vxlan_sock_release · 00c83b01

由 Marcelo Leitner 提交于 12月 11, 2014

Currently, when trying to reuse a socket, vxlan_sock_add will grab
vn->sock_lock, locate a reusable socket, inc refcount and release
vn->sock_lock.

But vxlan_sock_release() will first decrement refcount, and then grab
that lock. refcnt operations are atomic but as currently we have
deferred works which hold vs->refcnt each, this might happen, leading to
a use after free (specially after vxlan_igmp_leave):

  CPU 1                            CPU 2

deferred work                    vxlan_sock_add
  ...                              ...
                                   spin_lock(&vn->sock_lock)
                                   vs = vxlan_find_sock();
  vxlan_sock_release
    dec vs->refcnt, reaches 0
    spin_lock(&vn->sock_lock)
                                   vxlan_sock_hold(vs), refcnt=1
                                   spin_unlock(&vn->sock_lock)
    hlist_del_rcu(&vs->hlist);
    vxlan_notify_del_rx_port(vs)
    spin_unlock(&vn->sock_lock)

So when we look for a reusable socket, we check if it wasn't freed
already before reusing it.
Signed-off-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
Fixes: 7c47cedf ("vxlan: move IGMP join/leave to work queue")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00c83b01

03 12月, 2014 1 次提交

net: make vid as a parameter for ndo_fdb_add/ndo_fdb_del · f6f6424b

由 Jiri Pirko 提交于 11月 28, 2014

Do the work of parsing NDA_VLAN directly in rtnetlink code, pass simple
u16 vid to drivers from there.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Acked-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6f6424b

26 11月, 2014 1 次提交

vxlan: Fix boolean flip in VXLAN_F_UDP_ZERO_CSUM6_[TX|RX] · 3dc2b6a8

由 Alexander Duyck 提交于 11月 24, 2014

In "vxlan: Call udp_sock_create" there was a logic error that resulted in
the default for IPv6 VXLAN tunnels going from using checksums to not using
checksums.  Since there is currently no support in iproute2 for setting
these values it means that a kernel after the change cannot talk over a IPv6
VXLAN tunnel to a kernel prior the change.

Fixes: 3ee64f39 ("vxlan: Call udp_sock_create")

Cc: Tom Herbert <therbert@google.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Acked-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3dc2b6a8

22 11月, 2014 2 次提交

vlan: introduce *vlan_hwaccel_push_inside helpers · 5968250c

由 Jiri Pirko 提交于 11月 19, 2014

Use them to push skb->vlan_tci into the payload and avoid code
duplication.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5968250c

vlan: rename __vlan_put_tag to vlan_insert_tag_set_proto · 62749e2c

由 Jiri Pirko 提交于 11月 19, 2014

Name fits better. Plus there's going to be introduced
__vlan_insert_tag later on.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

62749e2c

19 11月, 2014 1 次提交

vxlan: Inline vxlan_gso_check(). · 11bf7828

由 Joe Stringer 提交于 11月 17, 2014

Suggested-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NJoe Stringer <joestringer@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

11bf7828

15 11月, 2014 1 次提交

net: Add vxlan_gso_check() helper · 23e62de3

由 Joe Stringer 提交于 11月 13, 2014

Most NICs that report NETIF_F_GSO_UDP_TUNNEL support VXLAN, and not
other UDP-based encapsulation protocols where the format and size of the
header differs. This patch implements a generic ndo_gso_check() for
VXLAN which will only advertise GSO support when the skb looks like it
contains VXLAN (or no UDP tunnelling at all).

Implementation shamelessly stolen from Tom Herbert:
http://thread.gmane.org/gmane.linux.network/332428/focus=333111Signed-off-by: NJoe Stringer <joestringer@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

23e62de3

14 11月, 2014 1 次提交

vxlan: Do not reuse sockets for a different address family · 19ca9fc1

由 Marcelo Leitner 提交于 11月 13, 2014

Currently, we only match against local port number in order to reuse
socket. But if this new vxlan wants an IPv6 socket and a IPv4 one bound
to that port, vxlan will reuse an IPv4 socket as IPv6 and a panic will
follow. The following steps reproduce it:

   # ip link add vxlan6 type vxlan id 42 group 229.10.10.10 \
       srcport 5000 6000 dev eth0
   # ip link add vxlan7 type vxlan id 43 group ff0e::110 \
       srcport 5000 6000 dev eth0
   # ip link set vxlan6 up
   # ip link set vxlan7 up
   <panic>

[    4.187481] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
...
[    4.188076] Call Trace:
[    4.188085]  [<ffffffff81667c4a>] ? ipv6_sock_mc_join+0x3a/0x630
[    4.188098]  [<ffffffffa05a6ad6>] vxlan_igmp_join+0x66/0xd0 [vxlan]
[    4.188113]  [<ffffffff810a3430>] process_one_work+0x220/0x710
[    4.188125]  [<ffffffff810a33c4>] ? process_one_work+0x1b4/0x710
[    4.188138]  [<ffffffff810a3a3b>] worker_thread+0x11b/0x3a0
[    4.188149]  [<ffffffff810a3920>] ? process_one_work+0x710/0x710

So address family must also match in order to reuse a socket.
Reported-by: NJean-Tsung Hsiao <jhsiao@redhat.com>
Signed-off-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

19ca9fc1

11 11月, 2014 1 次提交

udptunnel: Add SKB_GSO_UDP_TUNNEL during gro_complete. · cfdf1e1b

由 Jesse Gross 提交于 11月 10, 2014

When doing GRO processing for UDP tunnels, we never add
SKB_GSO_UDP_TUNNEL to gso_type - only the type of the inner protocol
is added (such as SKB_GSO_TCPV4). The result is that if the packet is
later resegmented we will do GSO but not treat it as a tunnel. This
results in UDP fragmentation of the outer header instead of (i.e.) TCP
segmentation of the inner header as was originally on the wire.
Signed-off-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cfdf1e1b

07 11月, 2014 1 次提交

vxlan: Fix to enable UDP checksums on interface · 5c91ae08

由 Tom Herbert 提交于 11月 06, 2014

Add definition to vxlan nla_policy for UDP checksum. This is necessary
to enable UDP checksums on VXLAN.

In some instances, enabling UDP checksums can improve performance on
receive for devices that return legacy checksum-unnecessary for UDP/IP.
Also, UDP checksum provides some protection against VNI corruption.

Testing:

Ran 200 instances of TCP_STREAM and TCP_RR on bnx2x.

TCP_STREAM
  IPv4, without UDP checksums
      14.41% TX CPU utilization
      25.71% RX CPU utilization
      9083.4 Mbps
  IPv4, with UDP checksums
      13.99% TX CPU utilization
      13.40% RX CPU utilization
      9095.65 Mbps

TCP_RR
  IPv4, without UDP checksums
      94.08% TX CPU utilization
      156/248/462 90/95/99% latencies
      1.12743e+06 tps
  IPv4, with UDP checksums
      94.43% TX CPU utilization
      158/250/462 90/95/99% latencies
      1.13345e+06 tps
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c91ae08

18 10月, 2014 1 次提交

vxlan: fix a free after use · 7a9f526f

由 Li RongQing 提交于 10月 17, 2014

pskb_may_pull maybe change skb->data and make eth pointer oboslete,
so eth needs to reload

Fixes: 91269e39 ("vxlan: using pskb_may_pull as early as possible")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a9f526f

16 10月, 2014 2 次提交

vxlan: using pskb_may_pull as early as possible · 91269e39

由 Li RongQing 提交于 10月 16, 2014

pskb_may_pull should be used to check if skb->data has enough space,
skb->len can not ensure that.

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91269e39

vxlan: fix a use after free in vxlan_encap_bypass · ce6502a8

由 Li RongQing 提交于 10月 16, 2014

when netif_rx() is done, the netif_rx handled skb maybe be freed,
and should not be used.
Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce6502a8

08 10月, 2014 1 次提交

net: better IFF_XMIT_DST_RELEASE support · 02875878

由 Eric Dumazet 提交于 10月 05, 2014

Testing xmit_more support with netperf and connected UDP sockets,
I found strange dst refcount false sharing.

Current handling of IFF_XMIT_DST_RELEASE is not optimal.

Dropping dst in validate_xmit_skb() is certainly too late in case
packet was queued by cpu X but dequeued by cpu Y

The logical point to take care of drop/force is in __dev_queue_xmit()
before even taking qdisc lock.

As Julian Anastasov pointed out, need for skb_dst() might come from some
packet schedulers or classifiers.

This patch adds new helper to cleanly express needs of various drivers
or qdiscs/classifiers.

Drivers that need skb_dst() in their ndo_start_xmit() should call
following helper in their setup instead of the prior :

	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
->
	netif_keep_dst(dev);

Instead of using a single bit, we use two bits, one being
eventually rebuilt in bonding/team drivers.

The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
rebuilt in bonding/team. Eventually, we could add something
smarter later.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02875878

02 10月, 2014 1 次提交

vxlan: Set inner protocol before transmit · 996c9fd1

由 Tom Herbert 提交于 9月 29, 2014

Call skb_set_inner_protocol to set inner Ethernet protocol to
ETH_P_TEB before transmit. This is needed for GSO with UDP tunnels.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

996c9fd1

24 9月, 2014 1 次提交

vxlan: Fix bug introduced by commit · 3c4d1dae

由 Andy Zhou 提交于 9月 23, 2014

Commit acbf74a7 ("vxlan: Refactor vxlan driver to make use of the common UDP tunnel functions." introduced a bug in vxlan_xmit_one()
function, causing it to transmit Vxlan packets without proper
Vxlan header inserted. The change was not needed in the first
place. Revert it.
Reported-by: NTom Herbert <therbert@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NAndy Zhou <azhou@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c4d1dae

20 9月, 2014 1 次提交

vxlan: Refactor vxlan driver to make use of the common UDP tunnel functions. · acbf74a7

由 Andy Zhou 提交于 9月 16, 2014

Simplify vxlan implementation using common UDP tunnel APIs.
Signed-off-by: NAndy Zhou <azhou@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

acbf74a7

02 9月, 2014 1 次提交
- T
  vxlan: Enable checksum unnecessary conversions for vxlan/UDP sockets · c60c308c
  由 Tom Herbert 提交于 8月 31, 2014
```
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  c60c308c
30 8月, 2014 1 次提交

net: Clarification of CHECKSUM_UNNECESSARY · 77cffe23

由 Tom Herbert 提交于 8月 27, 2014

This patch:
 - Clarifies the specific requirements of devices returning
   CHECKSUM_UNNECESSARY (comments in skbuff.h).
 - Adds csum_level field to skbuff. This is used to express how
   many checksums are covered by CHECKSUM_UNNECESSARY (stores n - 1).
   This replaces the overloading of skb->encapsulation, that field is
   is now only used to indicate inner headers are valid.
 - Change __skb_checksum_validate_needed to "consume" each checksum
   as indicated by csum_level as layers of the the packet are parsed.
 - Remove skb_pop_rcv_encapsulation, no longer needed in the new
   csum_level model.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77cffe23

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功