提交 · 46e5da40aec256155cfedee96dd21a75da941f2c · openanolis / cloud-kernel

14 9月, 2014 1 次提交

net: qdisc: use rcu prefix and silence sparse warnings · 46e5da40

由 John Fastabend 提交于 9月 12, 2014

Add __rcu notation to qdisc handling by doing this we can make
smatch output more legible. And anyways some of the cases should
be using rcu_dereference() see qdisc_all_tx_empty(),
qdisc_tx_chainging(), and so on.

Also *wake_queue() API is commonly called from driver timer routines
without rcu lock or rtnl lock. So I added rcu_read_lock() blocks
around netif_wake_subqueue and netif_tx_wake_queue.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46e5da40

04 9月, 2014 1 次提交

qdisc: validate frames going through the direct_xmit path · 1f59533f

由 Jesper Dangaard Brouer 提交于 9月 03, 2014

In commit 50cbe9ab ("net: Validate xmit SKBs right when we
pull them out of the qdisc") the validation code was moved out of
dev_hard_start_xmit and into dequeue_skb.

However this overlooked the fact that we do not always enqueue
the skb onto a qdisc. First situation is if qdisc have flag
TCQ_F_CAN_BYPASS and qdisc is empty.  Second situation is if
there is no qdisc on the device, which is a common case for
software devices.

Originally spotted and inital patch by Alexander Duyck.
As a result Alex was seeing issues trying to connect to a
vhost_net interface after commit 50cbe9ab was applied.

Added a call to validate_xmit_skb() in __dev_xmit_skb(), in the
code path for qdiscs with TCQ_F_CAN_BYPASS flag, and in
__dev_queue_xmit() when no qdisc.

Also handle the error situation where dev_hard_start_xmit() could
return a skb list, and does not return dev_xmit_complete(rc) and
falls through to the kfree_skb(), in that situation it should
call kfree_skb_list().

Fixes:  50cbe9ab ("net: Validate xmit SKBs right when we pull them out of the qdisc")
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f59533f

02 9月, 2014 10 次提交

net: Support for csum_bad in skbuff · 5a212329

由 Tom Herbert 提交于 8月 31, 2014

This flag indicates that an invalid checksum was detected in the
packet. __skb_mark_checksum_bad helper function was added to set this.

Checksums can be marked bad from a driver or the GRO path (the latter
is implemented in this patch). csum_bad is checked in
__skb_checksum_validate_complete (i.e. calling that when ip_summed ==
CHECKSUM_NONE).

csum_bad works in conjunction with ip_summed value. In the case that
ip_summed is CHECKSUM_NONE and csum_bad is set, this implies that the
first (or next) checksum encountered in the packet is bad. When
ip_summed is CHECKSUM_UNNECESSARY, the first checksum after the last
one validated is bad. For example, if ip_summed == CHECKSUM_UNNECESSARY,
csum_level == 1, and csum_bad is set-- then the third checksum in the
packet is bad. In the normal path, the packet will be dropped when
processing the protocol layer of the bad checksum:
__skb_decr_checksum_unnecessary called twice for the good checksums
changing ip_summed to CHECKSUM_NONE so that
__skb_checksum_validate_complete is called to validate the third
checksum and that will fail since csum_bad is set.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a212329

net: xmit_list() becomes dev_hard_start_xmit(). · 8dcda22a

由 David S. Miller 提交于 9月 01, 2014

Now fundamentally we can process lists of SKBs as cheaply
as single packets.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8dcda22a

net: Don't keep around original SKB when we software segment GSO frames. · ce93718f

由 David S. Miller 提交于 8月 30, 2014

Just maintain the list properly by returning the head of the remaining
SKB list from dev_hard_start_xmit().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce93718f

D
net: Validate xmit SKBs right when we pull them out of the qdisc. · 50cbe9ab
由 David S. Miller 提交于 8月 30, 2014
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
50cbe9ab

net: Separate out SKB validation logic from transmit path. · eae3f88e

由 David S. Miller 提交于 8月 30, 2014

dev_hard_start_xmit() does two things, it first validates and
canonicalizes the SKB, then it actually sends it.

Make a set of helper functions for doing the first part.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eae3f88e

D
net: Have xmit_list() signal more==true when appropriate. · 95f6b3dd
由 David S. Miller 提交于 8月 29, 2014
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
95f6b3dd
D
net: Pass a "more" indication down into netdev_start_xmit() code paths. · fa2dbdc2
由 David S. Miller 提交于 8月 29, 2014
```
For now it will always be false.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
fa2dbdc2

net: Move main gso loop out of dev_hard_start_xmit() into helper. · 7f2e870f

由 David S. Miller 提交于 8月 29, 2014

There is a slight policy change happening here as well.

The previous code would drop the entire rest of the GSO skb if any of
them got, for example, a congestion notification.

That makes no sense, anything NET_XMIT_MASK and below is something
like congestion or policing.  And in the congestion case it doesn't
even mean the packet was actually dropped.

Just continue until dev_xmit_complete() evaluates to false.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f2e870f

D
net: Create xmit_one() helper for dev_hard_start_xmit() · 2ea25513
由 David S. Miller 提交于 8月 29, 2014
```
Hopefully making the code a bit easier to read and digest.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
2ea25513

net: Do txq_trans_update() in netdev_start_xmit() · 10b3ad8c

由 David S. Miller 提交于 8月 29, 2014

That way we don't have to audit every call site to make sure it is
doing this properly.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10b3ad8c

30 8月, 2014 1 次提交

net: Allow GRO to use and set levels of checksum unnecessary · 662880f4

由 Tom Herbert 提交于 8月 27, 2014

Allow GRO path to "consume" checksums provided in CHECKSUM_UNNECESSARY
and to report new checksums verfied for use in fallback to normal
path.

Change GRO checksum path to track csum_level using a csum_cnt field
in NAPI_GRO_CB. On GRO initialization, if ip_summed is
CHECKSUM_UNNECESSARY set NAPI_GRO_CB(skb)->csum_cnt to
skb->csum_level + 1. For each checksum verified, decrement
NAPI_GRO_CB(skb)->csum_cnt while its greater than zero. If a checksum
is verfied and NAPI_GRO_CB(skb)->csum_cnt == 0, we have verified a
deeper checksum than originally indicated in skbuf so increment
csum_level (or initialize to CHECKSUM_UNNECESSARY if ip_summed is
CHECKSUM_NONE or CHECKSUM_COMPLETE).
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

662880f4

26 8月, 2014 2 次提交

net: fix checksum features handling in netif_skb_features() · db115037

由 Michal Kubeček 提交于 8月 25, 2014

This is follow-up to

  da08143b ("vlan: more careful checksum features handling")

which introduced more careful feature intersection in vlan code,
taking into account that HW_CSUM should be considered superset
of IP_CSUM/IPV6_CSUM. The same is needed in netif_skb_features()
in order to avoid offloading mismatch warning when vlan is
created on top of a bond consisting of slaves supporting IP/IPv6
checksumming but not vlan Tx offloading.
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db115037

net: prevent of emerging cross-namespace symlinks · 4c75431a

由 Alexander Y. Fomichev 提交于 8月 25, 2014

Code manipulating sysfs symlinks on adjacent net_devices(s)
currently doesn't take into account that devices potentially
belong to different namespaces.

This patch trying to fix an issue as follows:
- check for net_ns before creating / deleting symlink.
  for now only netdev_adjacent_rename_links and
  __netdev_adjacent_dev_remove are affected, afaics
  __netdev_adjacent_dev_insert implies both net_devs
  belong to the same namespace.
- Drop all existing symlinks to / from all adj_devs before
  switching namespace and recreate them just after.
Signed-off-by: NAlexander Y. Fomichev <git.user@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c75431a

25 8月, 2014 2 次提交

D
net: Add ops->ndo_xmit_flush() · 4798248e
由 David S. Miller 提交于 8月 22, 2014
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
4798248e

net: skb_gro_checksum_* functions · 573e8fca

由 Tom Herbert 提交于 8月 22, 2014

Add skb_gro_checksum_validate, skb_gro_checksum_validate_zero_check,
and skb_gro_checksum_simple_validate, and __skb_gro_checksum_complete.
These are the cognates of the normal checksum functions but are used
in the gro_receive path and operate on GRO related fields in sk_buffs.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

573e8fca

24 8月, 2014 1 次提交

net: use reciprocal_scale() helper · 8fc54f68

由 Daniel Borkmann 提交于 8月 23, 2014

Replace open codings of (((u64) <x> * <y>) >> 32) with reciprocal_scale().
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fc54f68

12 8月, 2014 1 次提交

net: Always untag vlan-tagged traffic on input. · 0d5501c1

由 Vlad Yasevich 提交于 8月 08, 2014

Currently the functionality to untag traffic on input resides
as part of the vlan module and is build only when VLAN support
is enabled in the kernel.  When VLAN is disabled, the function
vlan_untag() turns into a stub and doesn't really untag the
packets.  This seems to create an interesting interaction
between VMs supporting checksum offloading and some network drivers.

There are some drivers that do not allow the user to change
tx-vlan-offload feature of the driver.  These drivers also seem
to assume that any VLAN-tagged traffic they transmit will
have the vlan information in the vlan_tci and not in the vlan
header already in the skb.  When transmitting skbs that already
have tagged data with partial checksum set, the checksum doesn't
appear to be updated correctly by the card thus resulting in a
failure to establish TCP connections.

The following is a packet trace taken on the receiver where a
sender is a VM with a VLAN configued.  The host VM is running on
doest not have VLAN support and the outging interface on the
host is tg3:
10:12:43.503055 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
(0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27243,
offset 0, flags [DF], proto TCP (6), length 60)
    10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
-> 0x48d9), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
4294837885 ecr 0,nop,wscale 7], length 0
10:12:44.505556 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
(0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27244,
offset 0, flags [DF], proto TCP (6), length 60)
    10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
-> 0x44ee), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
4294838888 ecr 0,nop,wscale 7], length 0

This connection finally times out.

I've only access to the TG3 hardware in this configuration thus have
only tested this with TG3 driver.  There are a lot of other drivers
that do not permit user changes to vlan acceleration features, and
I don't know if they all suffere from a similar issue.

The patch attempt to fix this another way.  It moves the vlan header
stipping code out of the vlan module and always builds it into the
kernel network core.  This way, even if vlan is not supported on
a virtualizatoin host, the virtual machines running on top of such
host will still work with VLANs enabled.

CC: Patrick McHardy <kaber@trash.net>
CC: Nithin Nayak Sujir <nsujir@broadcom.com>
CC: Michael Chan <mchan@broadcom.com>
CC: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
Acked-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d5501c1

06 8月, 2014 1 次提交

net-timestamp: SCHED timestamp on entering packet scheduler · e7fd2885

由 Willem de Bruijn 提交于 8月 04, 2014

Kernel transmit latency is often incurred in the packet scheduler.
Introduce a new timestamp on transmission just before entering the
scheduler. When data travels through multiple devices (bonding,
tunneling, ...) each device will export an individual timestamp.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7fd2885

31 7月, 2014 1 次提交

net: Remove unlikely() for WARN_ON() conditions · 80019d31

由 Thomas Graf 提交于 7月 30, 2014

No need for the unlikely(), WARN_ON() and BUG_ON() internally use
unlikely() on the condition.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80019d31

21 7月, 2014 2 次提交

net: print a notification on device rename · 6fe82a39

由 Veaceslav Falico 提交于 7月 17, 2014

Currently it's done silently (from the kernel part), and thus it might be
hard to track the renames from logs.

Add a simple netdev_info() to notify the rename, but only in case the
previous name was valid.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Vlad Yasevich <vyasevic@redhat.com>
CC: stephen hemminger <stephen@networkplumber.org>
CC: Jerry Chu <hkchu@google.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
CC: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: NVeaceslav Falico <vfalico@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6fe82a39

net: print net_device reg_state in netdev_* unless it's registered · ccc7f496

由 Veaceslav Falico 提交于 7月 17, 2014

This way we'll always know in what status the device is, unless it's
running normally (i.e. NETDEV_REGISTERED).

Also, emit a warning once in case of a bad reg_state.

CC: "David S. Miller" <davem@davemloft.net>
CC: Jason Baron <jbaron@akamai.com>
CC: Eric Dumazet <edumazet@google.com>
CC: Vlad Yasevich <vyasevic@redhat.com>
CC: stephen hemminger <stephen@networkplumber.org>
CC: Jerry Chu <hkchu@google.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
CC: Joe Perches <joe@perches.com>
Signed-off-by: NVeaceslav Falico <vfalico@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ccc7f496

17 7月, 2014 2 次提交

net: remove open-coded skb_cow_head. · a40e0a66

由 françois romieu 提交于 7月 15, 2014

Signed-off-by: NFrancois Romieu <romieu@fr.zoreil.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a40e0a66

net-gre-gro: Fix a bug that breaks the forwarding path · c3caf119

由 Jerry Chu 提交于 7月 14, 2014

Fixed a bug that was introduced by my GRE-GRO patch
(bf5a755f net-gre-gro: Add GRE
support to the GRO stack) that breaks the forwarding path
because various GSO related fields were not set. The bug will
cause on the egress path either the GSO code to fail, or a
GRE-TSO capable (NETIF_F_GSO_GRE) NICs to choke. The following
fix has been tested for both cases.
Signed-off-by: NH.K. Jerry Chu <hkchu@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3caf119

16 7月, 2014 2 次提交

net: set name_assign_type in alloc_netdev() · c835a677

由 Tom Gundersen 提交于 7月 14, 2014

Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
all users to pass NET_NAME_UNKNOWN.

Coccinelle patch:

@@
expression sizeof_priv, name, setup, txqs, rxqs, count;
@@

(
-alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
+alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
|
-alloc_netdev_mq(sizeof_priv, name, setup, count)
+alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
|
-alloc_netdev(sizeof_priv, name, setup)
+alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
)

v9: move comments here from the wrong commit
Signed-off-by: NTom Gundersen <teg@jklm.no>
Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c835a677

net: set name assign type for renamed devices · 238fa362

由 Tom Gundersen 提交于 7月 14, 2014

Based on a patch from David Herrmann.

This is the only place devices can be renamed.

v9: restore revers-christmas-tree order of local variables
Signed-off-by: NTom Gundersen <teg@jklm.no>
Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

238fa362

08 7月, 2014 2 次提交

net: Fix NETDEV_CHANGE notifier usage causing spurious arp flush · 54951194

由 Loic Prylli 提交于 7月 01, 2014

A bug was introduced in NETDEV_CHANGE notifier sequence causing the
arp table to be sometimes spuriously cleared (including manual arp
entries marked permanent), upon network link carrier changes.

The changed argument for the notifier was applied only to a single
caller of NETDEV_CHANGE, missing among others netdev_state_change().
So upon net_carrier events induced by the network, which are
triggering a call to netdev_state_change(), arp_netdev_event() would
decide whether to clear or not arp cache based on random/junk stack
values (a kind of read buffer overflow).

Fixes: be9efd36 ("net: pass changed flags along with NETDEV_CHANGE event")
Fixes: 6c8b4e3f ("arp: flush arp cache on IFF_NOARP change")
Signed-off-by: NLoic Prylli <loicp@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54951194

net: Performance fix for process_backlog · 11ef7a89

由 Tom Herbert 提交于 6月 30, 2014

In process_backlog the input_pkt_queue is only checked once for new
packets and quota is artificially reduced to reflect precisely the
number of packets on the input_pkt_queue so that the loop exits
appropriately.

This patches changes the behavior to be more straightforward and
less convoluted. Packets are processed until either the quota
is met or there are no more packets to process.

This patch seems to provide a small, but noticeable performance
improvement. The performance improvement is a result of staying
in the process_backlog loop longer which can reduce number of IPI's.

Performance data using super_netperf TCP_RR with 200 flows:

Before fix:

88.06% CPU utilization
125/190/309 90/95/99% latencies
1.46808e+06 tps
1145382 intrs.sec.

With fix:

87.73% CPU utilization
122/183/296 90/95/99% latencies
1.4921e+06 tps
1021674.30 intrs./sec.
Signed-off-by: NTom Herbert <therbert@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

11ef7a89

02 7月, 2014 2 次提交

rtnetlink: allow to register ops without ops->setup set · b0ab2fab

由 Jiri Pirko 提交于 6月 26, 2014

So far, it is assumed that ops->setup is filled up. But there might be
case that ops might make sense even without ->setup. In that case,
forbid to newlink and dellink.

This allows to register simple rtnl link ops containing only ->kind.
That allows consistent way of passing device kind (either device-kind or
slave-kind) to userspace.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0ab2fab

net: fix some typos in comment · 9bf2b8c2

由 Ying Xue 提交于 6月 26, 2014

In commit 37112105("net:
QDISC_STATE_RUNNING dont need atomic bit ops") the
__QDISC_STATE_RUNNING is renamed to __QDISC___STATE_RUNNING,
but the old names existing in comment are not replaced with
the new name completely.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bf2b8c2

18 6月, 2014 1 次提交

net: delete duplicate dev_set_rx_mode() call · d215d10f

由 Peter Pan(潘卫平) 提交于 6月 16, 2014

In __dev_open(), it already calls dev_set_rx_mode().
and dev_set_rx_mode() has no effect for a net device which does not have
IFF_UP flag set.

So the call of dev_set_rx_mode() is duplicate in __dev_change_flags().
Signed-off-by: NWeiping Pan <panweiping3@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d215d10f

09 6月, 2014 1 次提交

net: force a list_del() in unregister_netdevice_many() · 87757a91

由 Eric Dumazet 提交于 6月 06, 2014

unregister_netdevice_many() API is error prone and we had too
many bugs because of dangling LIST_HEAD on stacks.

See commit f87e6f47 ("net: dont leave active on stack LIST_HEAD")

In fact, instead of making sure no caller leaves an active list_head,
just force a list_del() in the callee. No one seems to need to access
the list after unregister_netdevice_many()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87757a91

06 6月, 2014 1 次提交

MPLS: Use mpls_features to activate software MPLS GSO segmentation · 3b392ddb

由 Simon Horman 提交于 6月 04, 2014

If an MPLS packet requires segmentation then use mpls_features
to determine if the software implementation should be used.

As no driver advertises MPLS GSO segmentation this will always be
the case.

I had not noticed that this was necessary before as software MPLS GSO
segmentation was already being used in my test environment. I believe that
the reason for that is the skbs in question always had fragments and the
driver I used does not advertise NETIF_F_FRAGLIST (which seems to be the
case for most drivers). Thus software segmentation was activated by
skb_gso_ok().

This introduces the overhead of an extra call to skb_network_protocol()
in the case where where CONFIG_NET_MPLS_GSO is set and
skb->ip_summed == CHECKSUM_NONE.

Thanks to Jesse Gross for prompting me to investigate this.
Signed-off-by: NSimon Horman <horms@verge.net.au>
Acked-by: NYAMAMOTO Takashi <yamamoto@valinux.co.jp>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b392ddb

05 6月, 2014 1 次提交

net: use the new API kvfree() · 4cb28970

由 WANG Cong 提交于 6月 02, 2014

It is available since v3.15-rc5.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4cb28970

04 6月, 2014 1 次提交

net: remove some unless free on failure in alloc_netdev_mqs() · 92ff71b8

由 WANG Cong 提交于 6月 03, 2014

When we jump to free_pcpu on failure in alloc_netdev_mqs()
rx and tx queues are not yet allocated, so no need to free them.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92ff71b8

02 6月, 2014 1 次提交

net: fix wrong mac_len calculation for vlans · 4b9b1cdf

由 Nikolay Aleksandrov 提交于 5月 28, 2014

After 1e785f48 ("net: Start with correct mac_len in
skb_network_protocol") skb->mac_len is used as a start of the
calculation in skb_network_protocol() but that is not always correct. If
skb->protocol == 8021Q/AD, usually the vlan header is already inserted
in the skb (i.e. vlan reorder hdr == 0). Usually when the packet enters
dev_hard_xmit it has mac_len == 0 so we take 2 bytes from the
destination mac address (skb->data + VLAN_HLEN) as a type in
skb_network_protocol() and return vlan_depth == 4. In the case where TSO is
off, then the mac_len is set but it's == 18 (ETH_HLEN + VLAN_HLEN), so
skb_network_protocol() returns a type from inside the packet and
offset == 22. Also make vlan_depth unsigned as suggested before.
As suggested by Eric Dumazet, move the while() loop in the if() so we
can avoid additional testing in fast path.

Here are few netperf tests + debug printk's to illustrate:
cat netperf.tso-on.reorder-on.bugged
- Vlan -> device (reorder on, default, this case is okay)
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.3.1 () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    7111.54
[   81.605435] skb->len 65226 skb->gso_size 1448 skb->proto 0x800
skb->mac_len 0 vlan_depth 0 type 0x800

- Vlan -> device (reorder off, bad)
cat netperf.tso-on.reorder-off.bugged
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.3.1 () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00     241.35
[  204.578332] skb->len 1518 skb->gso_size 0 skb->proto 0x8100
skb->mac_len 0 vlan_depth 4 type 0x5301
0x5301 are the last two bytes of the destination mac.

And if we stop TSO, we may get even the following:
[   83.343156] skb->len 2966 skb->gso_size 1448 skb->proto 0x8100
skb->mac_len 18 vlan_depth 22 type 0xb84
Because mac_len already accounts for VLAN_HLEN.

After the fix:
cat netperf.tso-on.reorder-off.fixed
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.3.1 () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.01    5001.46
[   81.888489] skb->len 65230 skb->gso_size 1448 skb->proto 0x8100
skb->mac_len 0 vlan_depth 18 type 0x800

CC: Vlad Yasevich <vyasevic@redhat.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Daniel Borkman <dborkman@redhat.com>
CC: David S. Miller <davem@davemloft.net>

Fixes:1e785f48 ("net: Start with correct mac_len in
skb_network_protocol")
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b9b1cdf

17 5月, 2014 3 次提交

bonding: Fix stacked device detection in arp monitoring · 44a40855

由 Vlad Yasevich 提交于 5月 16, 2014

Prior to commit fbd929f2
	bonding: support QinQ for bond arp interval

the arp monitoring code allowed for proper detection of devices
stacked on top of vlans.  Since the above commit, the
code can still detect a device stacked on top of single
vlan, but not a device stacked on top of Q-in-Q configuration.
The search will only set the inner vlan tag if the route
device is the vlan device.  However, this is not always the
case, as it is possible to extend the stacked configuration.

With this patch it is possible to provision devices on
top Q-in-Q vlan configuration that should be used as
a source of ARP monitoring information.

For example:
ip link add link bond0 vlan10 type vlan proto 802.1q id 10
ip link add link vlan10 vlan100 type vlan proto 802.1q id 100
ip link add link vlan100 type macvlan

Note:  This patch limites the number of stacked VLANs to 2,
just like before.  The original, however had another issue
in that if we had more then 2 levels of VLANs, we would end
up generating incorrectly tagged traffic.  This is no longer
possible.

Fixes: fbd929f2 (bonding: support QinQ for bond arp interval)
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@redhat.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Ding Tianhong <dingtianhong@huawei.com>
CC: Patric McHardy <kaber@trash.net>
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44a40855

vlan: Fix lockdep warning with stacked vlan devices. · d38569ab

由 Vlad Yasevich 提交于 5月 16, 2014

This reverts commit dc8eaaa0.
	vlan: Fix lockdep warning when vlan dev handle notification

Instead we use the new new API to find the lock subclass of
our vlan device.  This way we can support configurations where
vlans are interspersed with other devices:
  bond -> vlan -> macvlan -> vlan
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d38569ab

net: Find the nesting level of a given device by type. · 4085ebe8

由 Vlad Yasevich 提交于 5月 16, 2014

Multiple devices in the kernel can be stacked/nested and they
need to know their nesting level for the purposes of lockdep.
This patch provides a generic function that determines a nesting
level of a particular device by its type (ex: vlan, macvlan, etc).
We only care about nesting of the same type of devices.

For example:
  eth0 <- vlan0.10 <- macvlan0 <- vlan1.20

The nesting level of vlan1.20 would be 1, since there is another vlan
in the stack under it.
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4085ebe8

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功