提交 · de5df63228fcfbd5bb7fd883774c18fec9e61f12 · openanolis / cloud-kernel

23 9月, 2014 29 次提交

net: sched: cls_u32 changes to knode must appear atomic to readers · de5df632

由 John Fastabend 提交于 9月 19, 2014

Changes to the cls_u32 classifier must appear atomic to the
readers. Before this patch if a change is requested for both
the exts and ifindex, first the ifindex is updated then the
exts with tcf_exts_change(). This opens a small window where
a reader can have a exts chain with an incorrect ifindex. This
violates the the RCU semantics.

Here we resolve this by always passing u32_set_parms() a copy
of the tc_u_knode to work on and then inserting it into the hash
table after the updates have been successfully applied.

Tested with the following short script:

#tc filter add dev p3p2 parent 8001:0 protocol ip prio 99 handle 1: \
	       u32 divisor 256

#tc filter add dev p3p2 parent 8001:0 protocol ip prio 99 \
	       u32 link 1: hashkey mask ffffff00 at 12    \
	       match ip src 192.168.8.0/2

#tc filter add dev p3p2 parent 8001:0 protocol ip prio 102    \
	       handle 1::10 u32 classid 1:2 ht 1: 	      \
	       match ip src 192.168.8.0/8 match ip tos 0x0a 1e

#tc filter change dev p3p2 parent 8001:0 protocol ip prio 102 \
		 handle 1::10 u32 classid 1:2 ht 1:        \
		 match ip src 1.1.0.0/8 match ip tos 0x0b 1e

CC: Eric Dumazet <edumazet@google.com>
CC: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de5df632

net: cls_u32: fix missed pcpu_success free_percpu · a1ddcfee

由 John Fastabend 提交于 9月 19, 2014

This fixes a missed free_percpu in the unwind code path and when
keys are destroyed.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a1ddcfee

D
bonding: remove the unnecessary notes for bond_xmit_broadcast() · 37ab7ddf
由 dingtianhong 提交于 9月 19, 2014
```
Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
37ab7ddf

bonding: slight optimization for bond_xmit_roundrobin() · a64d044e

由 dingtianhong 提交于 9月 19, 2014

When the slave is the curr_active_slave, no need to check
whether the slave is active or not, it is always active.
Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a64d044e

udp: Need to make ip6_udp_tunnel.c have GPL license · 3fcb95a8

由 Tom Herbert 提交于 9月 22, 2014

Unable to load various tunneling modules without this:

[   80.679049] fou: Unknown symbol udp_sock_create6 (err 0)
[   91.439939] ip6_udp_tunnel: Unknown symbol ip6_local_out (err 0)
[   91.439954] ip6_udp_tunnel: Unknown symbol __put_net (err 0)
[   91.457792] vxlan: Unknown symbol udp_sock_create6 (err 0)
[   91.457831] vxlan: Unknown symbol udp_tunnel6_xmit_skb (err 0)
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3fcb95a8

Merge branch 'be2net-next' · 5624e80f

由 David S. Miller 提交于 9月 22, 2014

Sathya Perla says:

====================
be2net: patch set

Patches 1 and 2 fix sparse warnings (static declaration needed and endian
declaration needed) introduced by the earlier patch set.

Patches 3 and 4 add 20G/40G speed reporting via ethtool for the Skyhawk-R
chip.

Patches 5 to 12 fix various style issues and checkpatch warnings in the
driver such as:
	- removing unnecessary return statements in void routines
	- adding needed blank lines after a declaration block
	- deleting multiple blank lines
	- inserting a blank line after a function/struct definition
	- removing space after typecast
	- fixing multiple assignments on a single line
	- fixing alignment on a line wrap
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5624e80f

be2net: fix alignment on line wrap · cd3307aa

由 Kalesh AP 提交于 9月 19, 2014

This patch fixes alignment whereever it doesn't match the open parenthesis
alignment.
Signed-off-by: NKalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd3307aa

be2net: remove multiple assignments on a single line · 5f820b6c

由 Kalesh AP 提交于 9月 19, 2014

This patch removes multiple assignments on a single line as warned
by checkpatch.
Signed-off-by: NKalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f820b6c

be2net: remove space after typecasts · 504fbf1e

由 Kalesh AP 提交于 9月 19, 2014

This patch removes unnecessary spaces after typecasts as per checkpatch warnings.
Signed-off-by: NKalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

504fbf1e

be2net: remove unnecessary blank lines after an open brace · 619f2d1a

由 Kalesh AP 提交于 9月 19, 2014

This patch fixes checkpatch warnings about blank lines after an open brace '{'.
Signed-off-by: NKalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

619f2d1a

be2net: insert a blank line after function/struct//enum definitions · e2fb1afa

由 Kalesh AP 提交于 9月 19, 2014

This patch inserts a blank line after function/struct/union/enum definitions
as per checkpatch warnings.
Signed-off-by: NKalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2fb1afa

be2net: remove multiple blank lines · d6f5473c

由 Kalesh AP 提交于 9月 19, 2014

This patch removes multiple blank lines in the driver as per checkpatch
warnings.
Signed-off-by: NKalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6f5473c

be2net: add blank line after declarations · 03d28ffe

由 Kalesh AP 提交于 9月 19, 2014

This patch fixes checkpatch warnings in be2net by adding a blank line
between declaration and code blocks.
Signed-off-by: NKalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03d28ffe

be2net: remove return statements for void functions · 627cd5f8

由 Kalesh AP 提交于 9月 19, 2014

Signed-off-by: NKalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

627cd5f8

be2net: add speed reporting for 20G-KR interface · d6b7a9b7

由 Vasundhara Volam 提交于 9月 19, 2014

This patch adds speed reporting via ethtool for 20G KR2 interface on the
Skyhawk-R chip.
Signed-off-by: NVasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6b7a9b7

be2net: add speed reporting for 40G/KR interface · ca39076c

由 Kalesh AP 提交于 9月 19, 2014

This patch adds speed reporting via ethtool for 40Gbps KR4 interface
on the Skyhawk-R chip.
Signed-off-by: NKalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca39076c

be2net: fix sparse warnings in be_cmd_req_port_type{} · 72d7e2bf

由 Suresh Reddy 提交于 9月 19, 2014

This patch fixes a sprase warnings regarding endian declarations introduced
by the following commit:

fixes: e36edd9d ("be2net: add ethtool "-m" option support")
Signed-off-by: NSuresh Reddy <Suresh.Reddy@emulex.com>
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72d7e2bf

be2net: fix a sparse warning in be_cmd_modify_eqd() · b502ae8d

由 Kalesh AP 提交于 9月 19, 2014

This patch fixes a sparse warning about missing static declaration that was
introduced by the following commit:

fixes: 93676703 ("be2net: send a max of 8 EQs to be_cmd_modify_eqd() on Lancer")
Signed-off-by: NKalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b502ae8d

net: keep original skb which only needs header checking during software GSO · cecda693

由 Jason Wang 提交于 9月 19, 2014

Commit ce93718f ("net: Don't keep
around original SKB when we software segment GSO frames") frees the
original skb after software GSO even for dodgy gso skbs. This breaks
the stream throughput from untrusted sources, since only header
checking was done during software GSO instead of a true
segmentation. This patch fixes this by freeing the original gso skb
only when it was really segmented by software.

Fixes ce93718f ("net: Don't keep
around original SKB when we software segment GSO frames.")

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cecda693

net: fec: fix code identation · b749fc9b

由 Nimrod Andy 提交于 9月 19, 2014

There have extra identation before .skb_copy_to_linear_data_offset(),
this patch just remove the identation.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NFugang Duan <B38611@freescale.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b749fc9b

Merge branch 'dsa-suspend' · 61a3bd14

由 David S. Miller 提交于 9月 22, 2014

Florian Fainelli says:

====================
dsa: Broadcom SF2 suspend/resume and WoL

This patch add supports for suspend/resume and configuring Wake-on-LAN
for Broadcom Starfighter 2 switches.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61a3bd14

net: dsa: bcm_sf2: add support for Wake-on-LAN · 96e65d7f

由 Florian Fainelli 提交于 9月 18, 2014

In order for Wake-on-LAN to work properly, we query the parent network
device Wake-on-LAN features and advertise those. Similarly, when
configuring Wake-on-LAN on a per-port network interface, we make sure
that we do not accept something the master network devices does not
support.

Finally, we need to maintain a bitmask of the ports enabled for
Wake-on-LAN to prevent the suspend() callback from disabling a port that
is used for waking up the system.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

96e65d7f

net: dsa: add {get, set}_wol callbacks to slave devices · 19e57c4e

由 Florian Fainelli 提交于 9月 18, 2014

Allow switch drivers to implement per-port Wake-on-LAN getter and
setters.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

19e57c4e

net: dsa: bcm_sf2: add suspend/resume callbacks · 8cfa9498

由 Florian Fainelli 提交于 9月 18, 2014

Implement the suspend/resume callbacks for the Broadcom Starfighter 2
switch driver. Suspending the switch requires masking interrupts and
shutting down ports. Resuming the switch requires a software reset since
we do not know which power-sate we might be coming from, and re-enabling
the physical ports that are used.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8cfa9498

net: dsa: allow switch drivers to implement suspend/resume hooks · 24462549

由 Florian Fainelli 提交于 9月 18, 2014

Add an abstraction layer to suspend/resume switch devices, doing the
following split:

- suspend/resume the slave network devices and their corresponding PHY
  devices
- suspend/resume the switch hardware using switch driver callbacks
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24462549

Merge branch 'qlge' · 34f6b874

由 David S. Miller 提交于 9月 22, 2014

Harish Patil says:

====================
qlge: Fix compilation warning and update maintainers

This patch series includes the following set of patches:

- Fix the below warning message:
  qlge_main.c:1754: warning: 'lbq_desc' may be used uninitialized in this function

I have made changes according to your earlier feedback:

"Please fix this differently.  The problem is that the compiler can't see that
you've done the !length check at the top of the function, so when it later
sees the while (length > 0) loop, it doesn't know that this loop will always
execute at least once. Just change that loop to a do { } while() loop and
the compiler will be able to see everything."

- Update qlge driver maintainers list
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34f6b874

Update qlge driver maintainers list · c9b1a5b5

由 Harish Patil 提交于 9月 18, 2014

Signed-off-by: NHarish Patil <harish.patil@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9b1a5b5

qlge: Fix compilation warning · afe6e00c

由 Harish Patil 提交于 9月 18, 2014

Fix the below warning message:
qlge_main.c:1754: warning: 'lbq_desc' may be used uninitialized in this function
Signed-off-by: NHarish Patil <harish.patil@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

afe6e00c

am2150: Update nmclan_cs.c to use update PCMCIA API · 5f5316fc

由 Jeff Kirsher 提交于 9月 18, 2014

Resolves compile warning about use of a deprecated function call:
drivers/net/ethernet/amd/nmclan_cs.c: In function ‘nmclan_config’:
drivers/net/ethernet/amd/nmclan_cs.c:624:3: warning: ‘pcmcia_request_exclusive_irq’ is deprecated (declared at include/pcmcia/ds.h:213) [-Wdeprecated-declarations]
ret = pcmcia_request_exclusive_irq(link, mace_interrupt);

Updates pcmcia_request_exclusive_irq() to pcmcia_request_irq().

CC: Roger Pao <rpao@paonet.org>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f5316fc

20 9月, 2014 11 次提交

udp_tunnel: Only build ip6_udp_tunnel.c when IPV6 is selected · 6d967f87

由 Andy Zhou 提交于 9月 19, 2014

Functions supplied in ip6_udp_tunnel.c are only needed when IPV6 is
selected. When IPV6 is not selected, those functions are stubbed out
in udp_tunnel.h.

==================================================================
 net/ipv6/ip6_udp_tunnel.c:15:5: error: redefinition of 'udp_sock_create6'
     int udp_sock_create6(struct net *net, struct udp_port_cfg *cfg,
 In file included from net/ipv6/ip6_udp_tunnel.c:9:0:
      include/net/udp_tunnel.h:36:19: note: previous definition of 'udp_sock_create6' was here
       static inline int udp_sock_create6(struct net *net, struct udp_port_cfg *cfg,
==================================================================

Fixes:  fd384412 udp_tunnel: Seperate ipv6 functions into its own file
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NAndy Zhou <azhou@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d967f87

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · 6c62f606

由 David S. Miller 提交于 9月 19, 2014

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2014-09-18

This series contains updates to ixgbe and ixgbevf.

Ethan Zhao cleans up ixgbe and ixgbevf by removing bd_number from the
adapter struct because it is not longer useful.

Mark fixes ixgbe where if a hardware transmit timestamp is requested,
an uninitialized workqueue entry may be scheduled.  Added a check for
a PTP clock to avoid that.

Jacob provides a number of cleanups for ixgbe.  Since we may call
ixgbe_acquire_msix_vectors() prior to registering our netdevice, we
should not use the netdevice specific printk and use e_dev_warn()
instead.  Similar to how ixgbevf handles acquiring MSI-X vectors, we
can return an error code instead of relying on the flag being set.
This makes it more clear that we have failed to setup MSI-X mode and
will make it easier to consolidate MSI-X related code into a single
function.  In the case of disabling DCB, it is not an error since we
still can function, we just have to let the user know.  So use
e_dev_warn() instead of e_err().  Added warnings for other features
that are disabled when we are without MSI-X support.  Cleanup flags
that are no longer used or needed.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c62f606

Merge branch 'mlx4-next' · 58310b3f

由 David S. Miller 提交于 9月 19, 2014

Or Gerlitz says:

====================
mlx4: CQE/EQE stride support

This series from Ido Shamay is intended for archs having
cache line larger then 64 bytes.

Since our CQE/EQEs are generally 64B in those systems, HW will write
twice to the same cache line consecutively, causing pipe locks due to
he hazard prevention mechanism. For elements in a cyclic buffer, writes
are consecutive, so entries smaller than a cache line should be
avoided, especially if they are written at a high rate.

Reduce consecutive writes to same cache line in CQs/EQs, by allowing the
driver to increase the distance between entries so that each will reside
in a different cache line.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58310b3f

net/mlx4_en: Add mlx4_en_get_cqe helper · b1b6b4da

由 Ido Shamay 提交于 9月 18, 2014

This function derives the base address of the CQE from the CQE size,
and calculates the real CQE context segment in it from the factor
(this is like before). Before this change the code used the factor to
calculate the base address of the CQE as well.

The factor indicates in which segment of the cqe stride the cqe information
is located. For 32-byte strides, the segment is 0, and for 64 byte strides,
the segment is 1 (bytes 32..63). Using the factor was ok as long as we had
only 32 and 64 byte strides. However, with larger strides, the factor is zero,
and so cannot be used to calculate the base of the CQE.

The helper uses the same method of CQE buffer pulling made by all other
components that reads the CQE buffer (mlx4_ib driver and libmlx4).
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1b6b4da

net/mlx4_core: Cache line EQE size support · 43c816c6

由 Ido Shamay 提交于 9月 18, 2014

Enable mlx4 interrupt handler to work with EQE stride feature,
The feature may be enabled when cache line is bigger than 64B.
The EQE size will then be the cache line size, and the context
segment resides in [0-31] offset.
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43c816c6

net/mlx4_core: Enable CQE/EQE stride support · 77507aa2

由 Ido Shamay 提交于 9月 18, 2014

This feature is intended for archs having cache line larger then 64B.

Since our CQE/EQEs are generally 64B in those systems, HW will write
twice to the same cache line consecutively, causing pipe locks due to
he hazard prevention mechanism. For elements in a cyclic buffer, writes
are consecutive, so entries smaller than a cache line should be
avoided, especially if they are written at a high rate.

Reduce consecutive writes to same cache line in CQs/EQs, by allowing the
driver to increase the distance between entries so that each will reside
in a different cache line. Until the introduction of this feature, there
were two types of CQE/EQE:

1. 32B stride and context in the [0-31] segment
2. 64B stride and context in the [32-63] segment

This feature introduces two additional types:

3. 128B stride and context in the [0-31] segment (128B cache line)
4. 256B stride and context in the [0-31] segment (256B cache line)

Modify the mlx4_core driver to query the device for the CQE/EQE cache
line stride capability and to enable that capability when the host
cache line size is larger than 64 bytes (supported cache lines are
128B and 256B).

The mlx4 IB driver and libmlx4 need not be aware of this change. The PF
context behaviour is changed to require this change in VF drivers
running on such archs.
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77507aa2

net: fix sparse warnings in SNMP_UPD_PO_STATS(_BH) · 54003f11

由 Sabrina Dubroca 提交于 9月 17, 2014

ptr used to be a non __percpu pointer (result of a this_cpu_ptr
assignment, 7d720c3e ("percpu: add __percpu sparse annotations to
net")). Since d25398df ("net: avoid reloads in SNMP_UPD_PO_STATS"),
that's no longer the case, SNMP_UPD_PO_STATS uses this_cpu_add and ptr
is now __percpu.

Silence sparse warnings by preserving the original type and
annotation, and remove the out-of-date comment.

warning: incorrect type in initializer (different address spaces)
   expected unsigned long long *ptr
   got unsigned long long [noderef] <asn:3>*<noident>
warning: incorrect type in initializer (different address spaces)
   expected void const [noderef] <asn:3>*__vpp_verify
   got unsigned long long *<noident>
warning: incorrect type in initializer (different address spaces)
   expected void const [noderef] <asn:3>*__vpp_verify
   got unsigned long long *<noident>
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54003f11

Merge branch 'fou-next' · fb5690d2

由 David S. Miller 提交于 9月 19, 2014

Tom Herbert says:

====================
net: foo-over-udp (fou)

This patch series implements foo-over-udp. The idea is that we can
encapsulate different IP protocols in UDP packets. The rationale for
this is that networking devices such as NICs and switches are usually
implemented with UDP (and TCP) specific mechanims for processing. For
instance, many switches and routers will implement a 5-tuple hash
for UDP packets to perform Equal Cost Multipath Routing (ECMP) or
RSS (on NICs). Many NICs also only provide rudimentary checksum
offload (basic TCP and UDP packet), with foo-over-udp we may be
able to leverage these NICs to offload checksums of tunneled packets
(using checksum unnecessary conversion and eventually remote checksum
offload)

An example encapsulation of IPIP over FOU is diagrammed below. As
illustrated, the packet overhead for FOU is the 8 byte UDP header.

+------------------+
|    IPv4 hdr      |
+------------------+
|     UDP hdr      |
+------------------+
|    IPv4 hdr      |
+------------------+
|     TCP hdr      |
+------------------+
|   TCP payload    |
+------------------+

Conceptually, FOU should be able to encapsulate any IP protocol.
The FOU header (UDP hdr.) is essentially an inserted header between the
IP header and transport, so in the case of TCP or UDP encapsulation
the pseudo header would be based on the outer IP header and its length
field must not include the UDP header.

* Receive

In this patch set the RX path for FOU is implemented in a new fou
module. To enable FOU for a particular protocol, a UDP-FOU socket is
opened to the port to receive FOU packets. The socket is mapped to the
IP protocol for the packets. The XFRM mechanism used to receive
encapsulated packets (udp_encap_rcv) for the port. Upon reception, the
UDP is removed and packet is reinjected in the stack for the
corresponding protocol associated with the socket (return -protocol
from udp_encap_rcv function).

GRO is provided with the appropriate fou_gro_receive and
fou_gro_complete. These routines need to know the encapsulation
protocol so we save that in udp_offloads structure with the port
and pass it in the napi_gro_cb structure.

* TX

This patch series implements FOU transmit encapsulation for IPIP, GRE, and
SIT. This done by some common infrastructure in ip_tunnel including an
ip_tunnel_encap to perform FOU encapsulation and common configuration
to enable FOU on IP tunnels. FOU is configured on existing tunnels and
does not create any new interfaces. The transmit and receive paths are
independent, so use of FOU may be assymetric between tunnel endpoints.

* Configuration

The fou module using netlink to configure FOU receive ports. The ip
command can be augmented with a fou subcommand to support this. e.g. to
configure FOU for IPIP on port 5555:

  ip fou add port 5555 ipproto 4

GRE, IPIP, and SIT have been modified with netlink commands to
configure use of FOU on transmit. The "ip link" command will be
augmented with an encap subcommand (for supporting various forms of
secondary encapsulation). For instance, to configure an ipip tunnel
with FOU on port 5555:

  ip link add name tun1 type ipip \
    remote 192.168.1.1 local 192.168.1.2 ttl 225 \
    encap fou encap-sport auto encap-dport 5555

* Notes
  - This patch set does not implement GSO for FOU. The UDP encapsulation
    code assumes TEB, so that will need to be reimplemented.
  - When a packet is received through FOU, the UDP header is not
    actually removed for the skbuf, pointers to transport header
    and length in the IP header are updated (like in ESP/UDP RX). A
    side effect is the IP header will now appear to have an incorrect
    checksum by an external observer (e.g. tcpdump), it will be off
    by sizeof UDP header. If necessary we could adjust the checksum
    to compensate.
  - Performance results are below. My expectation is that FOU should
    entail little overhead (clearly there is some work to do :-) ).
    Optimizing UDP socket lookup for encapsulation ports should help
    significantly.
  - I really don't expect/want devices to have special support for any
    of this. Generic checksum offload mechanisms (NETIF_HW_CSUM
    and use of CHECKSUM_COMPLETE) should be sufficient. RSS and flow
    steering is provided by commonly implemented UDP hashing. GRO/GSO
    seem fairly comparable with LRO/TSO already.

* Performance

Ran netperf TCP_RR and TCP_STREAM tests across various configurations.
This was performed on bnx2x and I disabled TSO/GSO on sender to get
fair comparison for FOU versus non-FOU. CPU utilization is reported
for receive in TCP_STREAM.

  GRE
    IPv4, FOU, UDP checksum enabled
      TCP_STREAM
        24.85% CPU utilization
        9310.6 Mbps
      TCP_RR
        94.2% CPU utilization
        155/249/460 90/95/99% latencies
        1.17018e+06 tps
    IPv4, FOU, UDP checksum disabled
      TCP_STREAM
        31.04% CPU utilization
        9302.22 Mbps
      TCP_RR
        94.13% CPU utilization
        154/239/419 90/95/99% latencies
        1.17555e+06 tps
    IPv4, no FOU
      TCP_STREAM
        23.13% CPU utilization
        9354.58 Mbps
      TCP_RR
        90.24% CPU utilization
        156/228/360 90/95/99% latencies
        1.18169e+06 tps

  IPIP
    FOU, UDP checksum enabled
      TCP_STREAM
        24.13% CPU utilization
        9328 Mbps
      TCP_RR
        94.23
        149/237/429 90/95/99% latencies
        1.19553e+06 tps
    FOU, UDP checksum disabled
      TCP_STREAM
        29.13% CPU utilization
        9370.25 Mbps
      TCP_RR
        94.13% CPU utilization
        149/232/398 90/95/99% latencies
        1.19225e+06 tps
    No FOU
      TCP_STREAM
        10.43% CPU utilization
        5302.03 Mbps
      TCP_RR
        51.53% CPU utilization
        215/324/475 90/95/99% latencies
        864998 tps

  SIT
    FOU, UDP checksum enabled
      TCP_STREAM
        30.38% CPU utilization
        9176.76 Mbps
      TCP_RR
        96.9% CPU utilization
        170/281/581 90/95/99% latencies
        1.03372e+06 tps
    FOU, UDP checksum disabled
      TCP_STREAM
        39.6% CPU utilization
        9176.57 Mbps
      TCP_RR
        97.14% CPU utilization
        167/272/548 90/95/99% latencies
        1.03203e+06 tps
    No FOU
      TCP_STREAM
        11.2% CPU utilization
        4636.05 Mbps
      TCP_RR
        59.51% CPU utilization
        232/346/489 90/95/99% latencies
        813199 tps

v2:
  - Removed encap IP tunnel ioctls, configuration is done by netlink
    only.
  - Don't export fou_create and fou_destroy, they are currently
    intended to be called within fou module only.
  - Filled on tunnel netlink structures and functions for new values.

v3:
  - Fixed change logs for some of the patches.
  - Remove inline from fou_gro_receive and fou_gro_complete, let
    compiler decide on these.

v4:
  - Don't need to cast void in fou_from_sock
  - Removed incorrest htons for port in fou_destroy
  - Some minor cleanup for readability
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb5690d2

gre: Setup and TX path for gre/UDP foo-over-udp encapsulation · 4565e991

由 Tom Herbert 提交于 9月 17, 2014

Added netlink attrs to configure FOU encapsulation for GRE, netlink
handling of these flags, and properly adjust MTU for encapsulation.
ip_tunnel_encap is called from ip_tunnel_xmit to actually perform FOU
encapsulation.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4565e991

ipip: Setup and TX path for ipip/UDP foo-over-udp encapsulation · 473ab820

由 Tom Herbert 提交于 9月 17, 2014

Add netlink handling for IP tunnel encapsulation parameters and
and adjustment of MTU for encapsulation.  ip_tunnel_encap is called
from ip_tunnel_xmit to actually perform FOU encapsulation.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

473ab820

sit: Setup and TX path for sit/UDP foo-over-udp encapsulation · 14909664

由 Tom Herbert 提交于 9月 17, 2014

Added netlink handling of IP tunnel encapulation paramters, properly
adjust MTU for encapsulation. Added ip_tunnel_encap call to
ipip6_tunnel_xmit to actually perform FOU encapsulation.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

14909664

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功