提交 · 576b7cd2f6ff1e90b3fc0a000d2fe74f8a50a4bb · openeuler / raspberrypi-kernel

18 3月, 2015 1 次提交

netdevice.h: fix ndo_bridge_* comments · ad41faa8

由 Nicolas Dichtel 提交于 3月 17, 2015

The argument 'flags' was missing in ndo_bridge_setlink().
ndo_bridge_dellink() was missing.

Fixes: 407af329 ("bridge: Add netlink interface to configure vlans on bridge ports")
Fixes: add511b3 ("bridge: add flags argument to ndo_bridge_setlink and ndo_bridge_dellink")
CC: Vlad Yasevich <vyasevic@redhat.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad41faa8

21 2月, 2015 1 次提交

net: Initialize all members in skb_gro_remcsum_init() · 846cd667

由 Geert Uytterhoeven 提交于 2月 18, 2015

skb_gro_remcsum_init() initializes the gro_remcsum.delta member only,
leading to compiler warnings about a possibly uninitialized
gro_remcsum.offset member:

drivers/net/vxlan.c: In function ‘vxlan_gro_receive’:
drivers/net/vxlan.c:602: warning: ‘grc.offset’ may be used uninitialized in this function
net/ipv4/fou.c: In function ‘gue_gro_receive’:
net/ipv4/fou.c:262: warning: ‘grc.offset’ may be used uninitialized in this function

While these are harmless for now:
  - skb_gro_remcsum_process() sets offset before changing delta,
  - skb_gro_remcsum_cleanup() checks if delta is non-zero before
    accessing offset,
it's safer to let the initialization function initialize all members.
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Acked-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

846cd667

12 2月, 2015 4 次提交

net: Infrastructure for CHECKSUM_PARTIAL with remote checsum offload · 15e2396d

由 Tom Herbert 提交于 2月 10, 2015

This patch adds infrastructure so that remote checksum offload can
set CHECKSUM_PARTIAL instead of calling csum_partial and writing
the modfied checksum field.

Add skb_remcsum_adjust_partial function to set an skb for using
CHECKSUM_PARTIAL with remote checksum offload.  Changed
skb_remcsum_process and skb_gro_remcsum_process to take a boolean
argument to indicate if checksum partial can be set or the
checksum needs to be modified using the normal algorithm.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

15e2396d

net: Use more bit fields in napi_gro_cb · baa32ff4

由 Tom Herbert 提交于 2月 10, 2015

This patch moves the free and same_flow fields to be bit fields
(2 and 1 bit sized respectively). This frees up some space for u16's.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

baa32ff4

net: Clarify meaning of CHECKSUM_PARTIAL for receive path · 6edec0e6

由 Tom Herbert 提交于 2月 10, 2015

The current meaning of CHECKSUM_PARTIAL for validating checksums
is that _all_ checksums in the packet are considered valid.
However, in the manner that CHECKSUM_PARTIAL is set only the checksum
at csum_start+csum_offset and any preceding checksums may
be considered valid. If there are checksums in the packet after
csum_offset it is possible they have not been verfied.

This patch changes CHECKSUM_PARTIAL logic in skb_csum_unnecessary and
__skb_gro_checksum_validate_needed to only considered checksums
referring to csum_start and any preceding checksums (with starting
offset before csum_start) to be verified.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6edec0e6

net: Fix remcsum in GRO path to not change packet · 26c4f7da

由 Tom Herbert 提交于 2月 10, 2015

Remote checksum offload processing is currently the same for both
the GRO and non-GRO path. When the remote checksum offload option
is encountered, the checksum field referred to is modified in
the packet. So in the GRO case, the packet is modified in the
GRO path and then the operation is skipped when the packet goes
through the normal path based on skb->remcsum_offload. There is
a problem in that the packet may be modified in the GRO path, but
then forwarded off host still containing the remote checksum option.
A remote host will again perform RCO but now the checksum verification
will fail since GRO RCO already modified the checksum.

To fix this, we ensure that GRO restores a packet to it's original
state before returning. In this model, when GRO processes a remote
checksum option it still changes the checksum per the algorithm
but on return from lower layer processing the checksum is restored
to its original value.

In this patch we add define gro_remcsum structure which is passed
to skb_gro_remcsum_process to save offset and delta for the checksum
being changed. After lower layer processing, skb_gro_remcsum_cleanup
is called to restore the checksum before returning from GRO.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26c4f7da

09 2月, 2015 2 次提交

net:rfs: adjust table size checking · 93c1af6c

由 Eric Dumazet 提交于 2月 08, 2015

Make sure root user does not try something stupid.

Also make sure mask field in struct rps_sock_flow_table
does not share a cache line with the potentially often dirtied
flow table.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: 567e4b79 ("net: rfs: add hash collision detection")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93c1af6c

net: rfs: add hash collision detection · 567e4b79

由 Eric Dumazet 提交于 2月 06, 2015

Receive Flow Steering is a nice solution but suffers from
hash collisions when a mix of connected and unconnected traffic
is received on the host, when flow hash table is populated.

Also, clearing flow in inet_release() makes RFS not very good
for short lived flows, as many packets can follow close().
(FIN , ACK packets, ...)

This patch extends the information stored into global hash table
to not only include cpu number, but upper part of the hash value.

I use a 32bit value, and dynamically split it in two parts.

For host with less than 64 possible cpus, this gives 6 bits for the
cpu number, and 26 (32-6) bits for the upper part of the hash.

Since hash bucket selection use low order bits of the hash, we have
a full hash match, if /proc/sys/net/core/rps_sock_flow_entries is big
enough.

If the hash found in flow table does not match, we fallback to RPS (if
it is enabled for the rxqueue).

This means that a packet for an non connected flow can avoid the
IPI through a unrelated/victim CPU.

This also means we no longer have to clear the table at socket
close time, and this helps short lived flows performance.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

567e4b79

05 2月, 2015 2 次提交

net/core: Add event for a change in slave state · 61bd3857

由 Moni Shoua 提交于 2月 03, 2015

Add event which provides an indication on a change in the state
of a bonding slave. The event handler should cast the pointer to the
appropriate type (struct netdev_bonding_info) in order to get the
full info about the slave.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61bd3857

net: add skb functions to process remote checksum offload · dcdc8994

由 Tom Herbert 提交于 2月 02, 2015

This patch adds skb_remcsum_process and skb_gro_remcsum_process to
perform the appropriate adjustments to the skb when receiving
remote checksum offload.

Updated vxlan and gue to use these functions.

Tested: Ran TCP_RR and TCP_STREAM netperf for VXLAN and GUE, did
not see any change in performance.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dcdc8994

02 2月, 2015 1 次提交

bridge: add flags argument to ndo_bridge_setlink and ndo_bridge_dellink · add511b3

由 Roopa Prabhu 提交于 1月 29, 2015

bridge flags are needed inside ndo_bridge_setlink/dellink handlers to
avoid another call to parse IFLA_AF_SPEC inside these handlers

This is used later in this series
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

add511b3

30 1月, 2015 1 次提交

dev: add per net_device packet type chains · 7866a621

由 Salam Noureddine 提交于 1月 27, 2015

When many pf_packet listeners are created on a lot of interfaces the
current implementation using global packet type lists scales poorly.
This patch adds per net_device packet type lists to fix this problem.

The patch was originally written by Eric Biederman for linux-2.6.29.
Tested on linux-3.16.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NSalam Noureddine <noureddine@arista.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7866a621

15 1月, 2015 2 次提交

netdevice: Add missing parentheses in macro · 4ccce02e

由 Benjamin Poirier 提交于 1月 14, 2015

For example, one could conceivably call
	for_each_netdev_in_bond_rcu(condition ? bond1 : bond2, slave)
and get an unexpected result.
Signed-off-by: NBenjamin Poirier <bpoirier@suse.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ccce02e

udp: pass udp_offload struct to UDP gro callbacks · a2b12f3c

由 Tom Herbert 提交于 1月 12, 2015

This patch introduces udp_offload_callbacks which has the same
GRO functions (but not a GSO function) as offload_callbacks,
except there is an argument to a udp_offload struct passed to
gro_receive and gro_complete functions. This additional argument
can be used to retrieve the per port structure of the encapsulation
for use in gro processing (mostly by doing container_of on the
structure).
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2b12f3c

14 1月, 2015 1 次提交

net: Corrected the comment describing the ndo operations to reflect the actual... · 5d632cb7

由 B Viswanath 提交于 1月 12, 2015

net: Corrected the comment describing the ndo operations to reflect the actual prototype for couple of operations

Corrected the comment describing the ndo operations to
reflect the actual prototype for couple of operations
Signed-off-by: NB Viswanath <marichika4@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d632cb7

27 12月, 2014 1 次提交

net: Generalize ndo_gso_check to ndo_features_check · 5f35227e

由 Jesse Gross 提交于 12月 23, 2014

GSO isn't the only offload feature with restrictions that
potentially can't be expressed with the current features mechanism.
Checksum is another although it's a general issue that could in
theory apply to anything. Even if it may be possible to
implement these restrictions in other ways, it can result in
duplicate code or inefficient per-packet behavior.

This generalizes ndo_gso_check so that drivers can remove any
features that don't make sense for a given packet, similar to
netif_skb_features(). It also converts existing driver
restrictions to the new format, completing the work that was
done to support tunnel protocols since the issues apply to
checksums as well.

By actually removing features from the set that are used to do
offloading, it solves another problem with the existing
interface. In these cases, GSO would run with the original set
of features and not do anything because it appears that
segmentation is not required.

CC: Tom Herbert <therbert@google.com>
CC: Joe Stringer <joestringer@nicira.com>
CC: Eric Dumazet <edumazet@google.com>
CC: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>
Acked-by: NTom Herbert <therbert@google.com>
Fixes: 04ffcb25 ("net: Add ndo_gso_check")
Tested-by: NHayes Wang <hayeswang@realtek.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f35227e

10 12月, 2014 2 次提交

ipvlan: move the device check function into netdevice.h · 5933fea7

由 Mahesh Bandewar 提交于 12月 06, 2014

Move the port check [ipvlan_dev_master()] and device check
[ipvlan_dev_slave()] functions to netdevice.h and rename them
netif_is_ipvlan_port() and netif_is_ipvlan() resp. to be
consistent with macvlan api naming.
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5933fea7

netdevice: Add a function to check macvlan port · 2f33e7d5

由 Mahesh Bandewar 提交于 12月 06, 2014

Similar to a check for macvlan device, netif_is_macvlan(), add
another function to check if a device is used as macvlan port.
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f33e7d5

03 12月, 2014 4 次提交

bridge: call netdev_sw_port_stp_update when bridge port STP status changes · 38dcf357

由 Scott Feldman 提交于 11月 28, 2014

To notify switch driver of change in STP state of bridge port, add new
.ndo op and provide switchdev wrapper func to call ndo op. Use it in bridge
code then.
Signed-off-by: NScott Feldman <sfeldma@gmail.com>
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38dcf357

net: introduce generic switch devices support · 007f790c

由 Jiri Pirko 提交于 11月 28, 2014

The goal of this is to provide a possibility to support various switch
chips. Drivers should implement relevant ndos to do so. Now there is
only one ndo defined:
- for getting physical switch id is in place.

Note that user can use random port netdevice to access the switch.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Reviewed-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

007f790c

net: rename netdev_phys_port_id to more generic name · 02637fce

由 Jiri Pirko 提交于 11月 28, 2014

So this can be reused for identification of other "items" as well.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Reviewed-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02637fce

net: make vid as a parameter for ndo_fdb_add/ndo_fdb_del · f6f6424b

由 Jiri Pirko 提交于 11月 28, 2014

Do the work of parsing NDA_VLAN directly in rtnetlink code, pass simple
u16 vid to drivers from there.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Acked-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6f6424b

25 11月, 2014 1 次提交

ipvlan: Initial check-in of the IPVLAN driver. · 2ad7bf36

由 Mahesh Bandewar 提交于 11月 23, 2014

This driver is very similar to the macvlan driver except that it
uses L3 on the frame to determine the logical interface while
functioning as packet dispatcher. It inherits L2 of the master
device hence the packets on wire will have the same L2 for all
the packets originating from all virtual devices off of the same
master device.

This driver was developed keeping the namespace use-case in
mind. Hence most of the examples given here take that as the
base setup where main-device belongs to the default-ns and
virtual devices are assigned to the additional namespaces.

The device operates in two different modes and the difference
in these two modes in primarily in the TX side.

(a) L2 mode : In this mode, the device behaves as a L2 device.
TX processing upto L2 happens on the stack of the virtual device
associated with (namespace). Packets are switched after that
into the main device (default-ns) and queued for xmit.

RX processing is simple and all multicast, broadcast (if
applicable), and unicast belonging to the address(es) are
delivered to the virtual devices.

(b) L3 mode : In this mode, the device behaves like a L3 device.
TX processing upto L3 happens on the stack of the virtual device
associated with (namespace). Packets are switched to the
main-device (default-ns) for the L2 processing. Hence the routing
table of the default-ns will be used in this mode.

RX processins is somewhat similar to the L2 mode except that in
this mode only Unicast packets are delivered to the virtual device
while main-dev will handle all other packets.

The devices can be added using the "ip" command from the iproute2
package -

	ip link add link <master> <virtual> type ipvlan mode [ l2 | l3 ]
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Laurent Chavey <chavey@google.com>
Cc: Tim Hockin <thockin@google.com>
Cc: Brandon Philips <brandon.philips@coreos.com>
Cc: Pavel Emelianov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ad7bf36

17 11月, 2014 1 次提交

net: provide a per host RSS key generic infrastructure · 960fb622

由 Eric Dumazet 提交于 11月 16, 2014

RSS (Receive Side Scaling) typically uses Toeplitz hash and a 40 or 52 bytes
RSS key.

Some drivers use a constant (and well known key), some drivers use a random
key per port, making bonding setups hard to tune. Well known keys increase
attack surface, considering that number of queues is usually a power of two.

This patch provides infrastructure to help drivers doing the right thing.

netdev_rss_key_fill() should be used by drivers to initialize their RSS key,
even if they provide ethtool -X support to let user redefine the key later.

A new /proc/sys/net/core/netdev_rss_key file can be used to get the host
RSS key even for drivers not providing ethtool -x support, in case some
applications want to precisely setup flows to match some RX queues.

Tested:

myhost:~# cat /proc/sys/net/core/netdev_rss_key
11:63:99:bb:79:fb:a5:a7:07:45:b2:20:bf:02:42:2d:08:1a:dd:19:2b:6b:23:ac:56:28:9d:70:c3:ac:e8:16:4b:b7:c1:10:53:a4:78:41:36:40:74:b6:15:ca:27:44:aa:b3:4d:72

myhost:~# ethtool -x eth0
RX flow hash indirection table for eth0 with 8 RX ring(s):
0: 0 1 2 3 4 5 6 7
RSS hash key:
11:63:99:bb:79:fb:a5:a7:07:45:b2:20:bf:02:42:2d:08:1a:dd:19:2b:6b:23:ac:56:28:9d:70:c3:ac:e8:16:4b:b7:c1:10:53:a4:78:41
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

960fb622

12 11月, 2014 1 次提交

net: kill netif_copy_real_num_queues() · 09626e9d

由 WANG Cong 提交于 11月 11, 2014

vlan was the only user of netif_copy_real_num_queues(),
but it no longer calls it after
commit 4af429d2 ("vlan: lockless transmit path").
So we can just remove it.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09626e9d

11 11月, 2014 1 次提交

net: gro: add a per device gro flush timer · 3b47d303

由 Eric Dumazet 提交于 11月 06, 2014

Tuning coalescing parameters on NIC can be really hard.

Servers can handle both bulk and RPC like traffic, with conflicting
goals : bulk flows want as big GRO packets as possible, RPC want minimal
latencies.

To reach big GRO packets on 10Gbe NIC, one can use :

ethtool -C eth0 rx-usecs 4 rx-frames 44

But this penalizes rpc sessions, with an increase of latencies, up to
50% in some cases, as NICs generally do not force an interrupt when
a packet with TCP Push flag is received.

Some NICs do not have an absolute timer, only a timer rearmed for every
incoming packet.

This patch uses a different strategy : Let GRO stack decides what do do,
based on traffic pattern.

Packets with Push flag wont be delayed.
Packets without Push flag might be held in GRO engine, if we keep
receiving data.

This new mechanism is off by default, and shall be enabled by setting
/sys/class/net/ethX/gro_flush_timeout to a value in nanosecond.

To fully enable this mechanism, drivers should use napi_complete_done()
instead of napi_complete().

Tested:
 Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)

Without this feature, we send back about 305,000 ACK per second.

GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)

Setting a timer of 2000 nsec is enough to increase GRO packet sizes
and reduce number of ACK packets. (811/19.2 = 42)

Receiver performs less calls to upper stacks, less wakes up.
This also reduces cpu usage on the sender, as it receives less ACK
packets.

Note that reducing number of wakes up increases cpu efficiency, but can
decrease QPS, as applications wont have the chance to warmup cpu caches
doing a partial read of RPC requests/answers if they fit in one skb.

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00
0.00      0.50

B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00
0.00      0.50
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b47d303

06 11月, 2014 2 次提交

net: Remove MPLS GSO feature. · 59b93b41

由 Pravin B Shelar 提交于 11月 05, 2014

Device can export MPLS GSO support in dev->mpls_features same way
it export vlan features in dev->vlan_features. So it is safe to
remove NETIF_F_GSO_MPLS redundant flag.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

59b93b41

udp: Changes to udp_offload to support remote checksum offload · e585f236

由 Tom Herbert 提交于 11月 04, 2014

Add a new GSO type, SKB_GSO_TUNNEL_REMCSUM, which indicates remote
checksum offload being done (in this case inner checksum must not
be offloaded to the NIC).

Added logic in __skb_udp_tunnel_segment to handle remote checksum
offload case.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e585f236

04 11月, 2014 2 次提交

netdevice: add ieee802154_ptr to net_device · 98a18b6f

由 Alexander Aring 提交于 11月 02, 2014

This patch adds an ieee802154_ptr to the net_device structure.
Furthermore the 802.15.4 subsystem will introduce a nl802154 framework
which is similar like the nl80211 framework and a wpan_dev structure.
The wpan_dev structure will hold additional net_device attributes like
address options which are 802.15.4 specific. In the upcoming nl802154
implementation we will introduce a NL802154_FLAG_NEED_WPAN_DEV like
NL80211_FLAG_NEED_WDEV. For this flag an ieee802154_ptr in net_device is
needed. Additional we can access the wpan_dev attributes in upper layers
like IEEE 802.15.4 6LoWPAN easily. Current solution is a complicated
callback interface and getting these values over subif data structure
in mac802154.
Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

98a18b6f

net: shrink struct softnet_data · 4cdb1e2e

由 Eric Dumazet 提交于 11月 02, 2014

flow_limit in struct softnet_data is only read from local cpu
and can be moved to fill a hole, reducing softnet_data size by
64 bytes on x86_64

While we are at it, move output_queue, output_queue_tailp and
completion_queue, so that rx / tx paths touch a single cache line.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4cdb1e2e

30 10月, 2014 1 次提交

net: introduce napi_schedule_irqoff() · bc9ad166

由 Eric Dumazet 提交于 10月 28, 2014

napi_schedule() can be called from any context and has to mask hard
irqs.

Add a variant that can only be called from hard interrupts handlers
or when irqs are already masked.

Many NIC drivers can use it from their hard IRQ handler instead of
generic variant.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc9ad166

16 10月, 2014 1 次提交

net: Add ndo_gso_check · 04ffcb25

由 Tom Herbert 提交于 10月 14, 2014

Add ndo_gso_check which a device can define to indicate whether is
is capable of doing GSO on a packet. This funciton would be called from
the stack to determine whether software GSO is needed to be done. A
driver should populate this function if it advertises GSO types for
which there are combinations that it wouldn't be able to handle. For
instance a device that performs UDP tunneling might only implement
support for transparent Ethernet bridging type of inner packets
or might have limitations on lengths of inner headers.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04ffcb25

09 10月, 2014 1 次提交

net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers · 53511453

由 Eric Dumazet 提交于 10月 08, 2014

Add two helpers so that drivers do not have to care of BQL being
available or not.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NJim Davis <jim.epost@gmail.com>
Fixes: 29d40c90 ("net/mlx4_en: Use prefetch in tx path")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53511453

08 10月, 2014 1 次提交

net: better IFF_XMIT_DST_RELEASE support · 02875878

由 Eric Dumazet 提交于 10月 05, 2014

Testing xmit_more support with netperf and connected UDP sockets,
I found strange dst refcount false sharing.

Current handling of IFF_XMIT_DST_RELEASE is not optimal.

Dropping dst in validate_xmit_skb() is certainly too late in case
packet was queued by cpu X but dequeued by cpu Y

The logical point to take care of drop/force is in __dev_queue_xmit()
before even taking qdisc lock.

As Julian Anastasov pointed out, need for skb_dst() might come from some
packet schedulers or classifiers.

This patch adds new helper to cleanly express needs of various drivers
or qdiscs/classifiers.

Drivers that need skb_dst() in their ndo_start_xmit() should call
following helper in their setup instead of the prior :

	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
->
	netif_keep_dst(dev);

Instead of using a single bit, we use two bits, one being
eventually rebuilt in bonding/team drivers.

The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
rebuilt in bonding/team. Eventually, we could add something
smarter later.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02875878

07 10月, 2014 1 次提交

net: introduce netdevice gso_min_segs attribute · fcbeb976

由 Eric Dumazet 提交于 10月 05, 2014

Some TSO engines might have a too heavy setup cost, that impacts
performance on hosts sending small bursts (2 MSS per packet).

This patch adds a device gso_min_segs, allowing drivers to set
a minimum segment size for TSO packets, according to the NIC
performance.

Tested on a mlx4 NIC, this allows to get a ~110% increase of
throughput when sending 2 MSS per packet.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fcbeb976

04 10月, 2014 2 次提交

fou: eliminate IPv4,v6 specific GRO functions · efc98d08

由 Tom Herbert 提交于 10月 03, 2014

This patch removes fou[46]_gro_receive and fou[46]_gro_complete
functions. The v4 or v6 variants were chosen for the UDP offloads
based on the address family of the socket this is not necessary
or correct. Alternatively, this patch adds is_ipv6 to napi_gro_skb.
This is set in udp6_gro_receive and unset in udp4_gro_receive. In
fou_gro_receive the value is used to select the correct inet_offloads
for the protocol of the outer IP header.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

efc98d08

qdisc: validate skb without holding lock · 55a93b3e

由 Eric Dumazet 提交于 10月 03, 2014

Validation of skb can be pretty expensive :

GSO segmentation and/or checksum computations.

We can do this without holding qdisc lock, so that other cpus
can queue additional packets.

Trick is that requeued packets were already validated, so we carry
a boolean so that sch_direct_xmit() can validate a fresh skb list,
or directly use an old one.

Tested on 40Gb NIC (8 TX queues) and 200 concurrent flows, 48 threads
host.

Turning TSO on or off had no effect on throughput, only few more cpu
cycles. Lock contention on qdisc lock disappeared.

Same if disabling TX checksum offload.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

55a93b3e

27 9月, 2014 1 次提交

net: Change netdev_<level> logging functions to return void · 6ea754eb

由 Joe Perches 提交于 9月 22, 2014

No caller or macro uses the return value so make all
the functions return void.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6ea754eb

26 9月, 2014 1 次提交

net: Remove gso_send_check as an offload callback · 53e50398

由 Tom Herbert 提交于 9月 20, 2014

The send_check logic was only interesting in cases of TCP offload and
UDP UFO where the checksum needed to be initialized to the pseudo
header checksum. Now we've moved that logic into the related
gso_segment functions so gso_send_check is no longer needed.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53e50398

20 9月, 2014 1 次提交

fou: Add GRO support · afe93325

由 Tom Herbert 提交于 9月 17, 2014

Implement fou_gro_receive and fou_gro_complete, and populate these
in the correponsing udp_offloads for the socket. Added ipproto to
udp_offloads and pass this from UDP to the fou GRO routine in proto
field of napi_gro_cb structure.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

afe93325