提交 · 3556485f1595e3964ba539e39ea682acbb835cee · openanolis / cloud-kernel

07 3月, 2012 1 次提交
- D
  net: Use bool for return value of dev_valid_name(). · 95f050bf
  由 David S. Miller 提交于 3月 06, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  95f050bf
06 3月, 2012 1 次提交

net: export netdev_stats_to_stats64 · 77a1abf5

由 Eric Dumazet 提交于 3月 05, 2012

Some drivers use internal netdev stats member to store part of their
stats, yet advertize ndo_get_stats64() to implement some 64bit fields.

Allow them to use netdev_stats_to_stats64() helper to make the copy of
netdev stats before they compute their 64bit counters.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77a1abf5

24 2月, 2012 1 次提交

static keys: Introduce 'struct static_key', static_key_true()/false() and... · c5905afb

由 Ingo Molnar 提交于 2月 24, 2012

static keys: Introduce 'struct static_key', static_key_true()/false() and static_key_slow_[inc|dec]()

So here's a boot tested patch on top of Jason's series that does
all the cleanups I talked about and turns jump labels into a
more intuitive to use facility. It should also address the
various misconceptions and confusions that surround jump labels.

Typical usage scenarios:

        #include <linux/static_key.h>

        struct static_key key = STATIC_KEY_INIT_TRUE;

        if (static_key_false(&key))
                do unlikely code
        else
                do likely code

Or:

        if (static_key_true(&key))
                do likely code
        else
                do unlikely code

The static key is modified via:

        static_key_slow_inc(&key);
        ...
        static_key_slow_dec(&key);

The 'slow' prefix makes it abundantly clear that this is an
expensive operation.

I've updated all in-kernel code to use this everywhere. Note
that I (intentionally) have not pushed through the rename
blindly through to the lowest levels: the actual jump-label
patching arch facility should be named like that, so we want to
decouple jump labels from the static-key facility a bit.

On non-jump-label enabled architectures static keys default to
likely()/unlikely() branches.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NJason Baron <jbaron@redhat.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: a.p.zijlstra@chello.nl
Cc: mathieu.desnoyers@efficios.com
Cc: davem@davemloft.net
Cc: ddaney.cavm@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.huSigned-off-by: NIngo Molnar <mingo@elte.hu>

c5905afb

09 2月, 2012 2 次提交

gro: more generic L2 header check · 5ca3b72c

由 Eric Dumazet 提交于 2月 08, 2012

Shlomo Pongratz reported GRO L2 header check was suited for Ethernet
only, and failed on IB/ipoib traffic.

He provided a patch faking a zeroed header to let GRO aggregates frames.

Roland Dreier, Herbert Xu, and others suggested we change GRO L2 header
check to be more generic, ie not assuming L2 header is 14 bytes, but
taking into account hard_header_len.

__napi_gro_receive() has special handling for the common case (Ethernet)
to avoid a memcmp() call and use an inline optimized function instead.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Reported-by: NShlomo Pongratz <shlomop@mellanox.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ca3b72c

gro: more generic L2 header check · 43480aec

由 Eric Dumazet 提交于 2月 08, 2012

Shlomo Pongratz reported GRO L2 header check was suited for Ethernet
only, and failed on IB/ipoib traffic.

He provided a patch faking a zeroed header to let GRO aggregates frames.

Roland Dreier, Herbert Xu, and others suggested we change GRO L2 header
check to be more generic, ie not assuming L2 header is 14 bytes, but
taking into account hard_header_len.

__napi_gro_receive() has special handling for the common case (Ethernet)
to avoid a memcmp() call and use an inline optimized function instead.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Reported-by: NShlomo Pongratz <shlomop@mellanox.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43480aec

02 2月, 2012 1 次提交

PATCH V2 net-next] net: dev: Convert printks to pr_<level> · 7b6cd1ce

由 Joe Perches 提交于 2月 01, 2012

Use the current logging style.
Coalesce formats where appropriate.
Update grammar where appropriate.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7b6cd1ce

18 1月, 2012 2 次提交

net: fix NULL-deref in WARN() in skb_gso_segment() · 65e9d2fa

由 Michał Mirosław 提交于 1月 17, 2012

Bug was introduced in commit c8f44aff.
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65e9d2fa

net: WARN if skb_checksum_help() is called on skb requiring segmentation · 36c92474

由 Ben Hutchings 提交于 1月 17, 2012

skb_checksum_help() has never done anything useful with skbs that
require segmentation.  Setting skb->ip_summed = CHECKSUM_NONE makes
them invalid and provokes a later WARNing in skb_gso_segment().

Passing such an skb to skb_checksum_help() indicates a bug, so we
should warn about it immediately.  Move the warning from
skb_gso_segment() into a shared function, and add gso_type and
gso_size to it.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36c92474

17 1月, 2012 1 次提交

net: Use device model to get driver name in skb_gso_segment() · e52ac339

由 Ben Hutchings 提交于 1月 16, 2012

ethtool operations generally require the caller to hold RTNL and are
not safe to call in atomic context.  The device model provides this
information for most devices; we'll only lose it for some old ISA
drivers.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e52ac339

02 12月, 2011 1 次提交

net: net_device flags is an unsigned int · b536db93

由 Eric Dumazet 提交于 11月 30, 2011

commit b00055aa ([NET] core: add RFC2863 operstate) changed
net_device flags from unsigned short to unsigned int.

Some core functions still assume its an unsigned short.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b536db93

01 12月, 2011 1 次提交

net/core: fix rollback handler in register_netdevice_notifier · 8f891489

由 RongQing.Li 提交于 11月 30, 2011

Within nested statements, the break statement terminates only the
do, for, switch, or while statement that immediately encloses it,
So replace the break with goto.
Signed-off-by: NRongQing.Li <roy.qing.li@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f891489

30 11月, 2011 3 次提交

net: Fix skb_update_prio RCU usage. · 6977a79d

由 Igor Maravic 提交于 11月 25, 2011

Change function rcu_dereference to rcu_dereference_bh to avoid warning

[ INFO: suspicious RCU usage. ]
-------------------------------
net/core/dev.c:2459 suspicious rcu_dereference_check() usage!

because we are locking with

rcu_read_lock_bh();

in function dev_queue_xmit(struct sk_buff *skb)
Signed-off-by: NIgor Maravic <igorm@etf.rs>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6977a79d

bql: Byte queue limits · 114cf580

由 Tom Herbert 提交于 11月 28, 2011

Networking stack support for byte queue limits, uses dynamic queue
limits library.  Byte queue limits are maintained per transmit queue,
and a dql structure has been added to netdev_queue structure for this
purpose.

Configuration of bql is in the tx-<n> sysfs directory for the queue
under the byte_queue_limits directory.  Configuration includes:
limit_min, bql minimum limit
limit_max, bql maximum limit
hold_time, bql slack hold time

Also under the directory are:
limit, current byte limit
inflight, current number of bytes on the queue
Signed-off-by: NTom Herbert <therbert@google.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

114cf580

net: Add queue state xoff flag for stack · 73466498

由 Tom Herbert 提交于 11月 28, 2011

Create separate queue state flags so that either the stack or drivers
can turn on XOFF.  Added a set of functions used in the stack to determine
if a queue is really stopped (either by stack or driver)
Signed-off-by: NTom Herbert <therbert@google.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73466498

29 11月, 2011 3 次提交

net: dont call jump_label_dec from irq context · b90e5794

由 Eric Dumazet 提交于 11月 28, 2011

Igor Maravic reported an error caused by jump_label_dec() being called
from IRQ context :

 BUG: sleeping function called from invalid context at kernel/mutex.c:271
 in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper
 1 lock held by swapper/0:
  #0:  (&n->timer){+.-...}, at: [<ffffffff8107ce90>] call_timer_fn+0x0/0x340
 Pid: 0, comm: swapper Not tainted 3.2.0-rc2-net-next-mpls+ #1
Call Trace:
 <IRQ>  [<ffffffff8104f417>] __might_sleep+0x137/0x1f0
 [<ffffffff816b9a2f>] mutex_lock_nested+0x2f/0x370
 [<ffffffff810a89fd>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff8109a37f>] ? local_clock+0x6f/0x80
 [<ffffffff810a90a5>] ? lock_release_holdtime.part.22+0x15/0x1a0
 [<ffffffff81557929>] ? sock_def_write_space+0x59/0x160
 [<ffffffff815e936e>] ? arp_error_report+0x3e/0x90
 [<ffffffff810969cd>] atomic_dec_and_mutex_lock+0x5d/0x80
 [<ffffffff8112fc1d>] jump_label_dec+0x1d/0x50
 [<ffffffff81566525>] net_disable_timestamp+0x15/0x20
 [<ffffffff81557a75>] sock_disable_timestamp+0x45/0x50
 [<ffffffff81557b00>] __sk_free+0x80/0x200
 [<ffffffff815578d0>] ? sk_send_sigurg+0x70/0x70
 [<ffffffff815e936e>] ? arp_error_report+0x3e/0x90
 [<ffffffff81557cba>] sock_wfree+0x3a/0x70
 [<ffffffff8155c2b0>] skb_release_head_state+0x70/0x120
 [<ffffffff8155c0b6>] __kfree_skb+0x16/0x30
 [<ffffffff8155c119>] kfree_skb+0x49/0x170
 [<ffffffff815e936e>] arp_error_report+0x3e/0x90
 [<ffffffff81575bd9>] neigh_invalidate+0x89/0xc0
 [<ffffffff81578dbe>] neigh_timer_handler+0x9e/0x2a0
 [<ffffffff81578d20>] ? neigh_update+0x640/0x640
 [<ffffffff81073558>] __do_softirq+0xc8/0x3a0

Since jump_label_{inc|dec} must be called from process context only,
we must defer jump_label_dec() if net_disable_timestamp() is called
from interrupt context.
Reported-by: NIgor Maravic <igorm@etf.rs>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b90e5794

net: use skb_flow_dissect() in __skb_get_rxhash() · 4504b861

由 Eric Dumazet 提交于 11月 28, 2011

No functional changes.

This uses the code we factorized in skb_flow_dissect()
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4504b861

net: Fix corruption in /proc/*/net/dev_mcast · 5cac98dd

由 Anton Blanchard 提交于 11月 27, 2011

I just hit this during my testing. Isn't there another bug lurking?

BUG kmalloc-8: Redzone overwritten

INFO: 0xc0000000de9dec48-0xc0000000de9dec4b. First byte 0x0 instead of 0xcc
INFO: Allocated in .__seq_open_private+0x30/0xa0 age=0 cpu=5 pid=3896
	.__kmalloc+0x1e0/0x2d0
	.__seq_open_private+0x30/0xa0
	.seq_open_net+0x60/0xe0
	.dev_mc_seq_open+0x4c/0x70
	.proc_reg_open+0xd8/0x260
	.__dentry_open.clone.11+0x2b8/0x400
	.do_last+0xf4/0x950
	.path_openat+0xf8/0x480
	.do_filp_open+0x48/0xc0
	.do_sys_open+0x140/0x250
	syscall_exit+0x0/0x40

dev_mc_seq_ops uses dev_seq_start/next/stop but only allocates
sizeof(struct seq_net_private) of private data, whereas it expects
sizeof(struct dev_iter_state):

struct dev_iter_state {
	struct seq_net_private p;
	unsigned int pos; /* bucket << BUCKET_SPACE + offset */
};

Create dev_seq_open_ops and use it so we don't have to expose
struct dev_iter_state.

[ Problem added by commit f04565dd (dev: use name hash for
  dev_seq_ops) -Eric ]
Signed-off-by: NAnton Blanchard <anton@samba.org>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5cac98dd

23 11月, 2011 1 次提交

net: add network priority cgroup infrastructure (v4) · 5bc1421e

由 Neil Horman 提交于 11月 22, 2011

This patch adds in the infrastructure code to create the network priority
cgroup.  The cgroup, in addition to the standard processes file creates two
control files:

1) prioidx - This is a read-only file that exports the index of this cgroup.
This is a value that is both arbitrary and unique to a cgroup in this subsystem,
and is used to index the per-device priority map

2) priomap - This is a writeable file.  On read it reports a table of 2-tuples
<name:priority> where name is the name of a network interface and priority is
indicates the priority assigned to frames egresessing on the named interface and
originating from a pid in this cgroup

This cgroup allows for skb priority to be set prior to a root qdisc getting
selected. This is benenficial for DCB enabled systems, in that it allows for any
application to use dcb configured priorities so without application modification
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
CC: Robert Love <robert.w.love@intel.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5bc1421e

18 11月, 2011 1 次提交

net: use jump_label to shortcut RPS if not setup · adc9300e

由 Eric Dumazet 提交于 11月 17, 2011

Most machines dont use RPS/RFS, and pay a fair amount of instructions in
netif_receive_skb() / netif_rx() / get_rps_cpu() just to discover
RPS/RFS is not setup.

Add a jump_label named rps_needed.

If no device rps_map or global rps_sock_flow_table is setup,
netif_receive_skb() / netif_rx() do a single instruction instead of many
ones, including conditional jumps.

jmp +0    (if CONFIG_JUMP_LABEL=y)
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Tom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

adc9300e

17 11月, 2011 4 次提交

net: remove NETIF_F_NO_CSUM feature bit · 34324dc2

由 Michał Mirosław 提交于 11月 15, 2011

Only distinct use is checking if NETIF_F_NOCACHE_COPY should be
enabled by default. The check heuristics is altered a bit here,
so it hits other people than before. The default shouldn't be
trusted for performance-critical cases anyway.

For all other uses NETIF_F_NO_CSUM is equivalent to NETIF_F_HW_CSUM.
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34324dc2

net: introduce and use netdev_features_t for device features sets · c8f44aff

由 Michał Mirosław 提交于 11月 15, 2011

v2:	add couple missing conversions in drivers
	split unexporting netdev_fix_features()
	implemented %pNF
	convert sock::sk_route_(no?)caps
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8f44aff

net: remove legacy ethtool ops · bc5787c6

由 Michał Mirosław 提交于 11月 15, 2011

As all drivers are converted, we may now remove discrete offload setting
callback handling.
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Acked-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc5787c6

net: use jump_label for netstamp_needed · 588f0330

由 Eric Dumazet 提交于 11月 15, 2011

netstamp_needed seems a good candidate to jump_label conversion.

This avoids 3 conditional branches per incoming packet in fast path.

No measurable difference, given that these conditional branches are
predicted on modern cpus. Only a small icache reduction, thanks to the
unlikely() stuff.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

588f0330

30 10月, 2011 1 次提交

vlan: allow nested vlan_do_receive() · 6a32e4f9

由 Eric Dumazet 提交于 10月 29, 2011

commit 2425717b (net: allow vlan traffic to be received under bond)
broke ARP processing on vlan on top of bonding.

       +-------+
eth0 --| bond0 |---bond0.103
eth1 --|       |
       +-------+

52870.115435: skb_gro_reset_offset <-napi_gro_receive
52870.115435: dev_gro_receive <-napi_gro_receive
52870.115435: napi_skb_finish <-napi_gro_receive
52870.115435: netif_receive_skb <-napi_skb_finish
52870.115435: get_rps_cpu <-netif_receive_skb
52870.115435: __netif_receive_skb <-netif_receive_skb
52870.115436: vlan_do_receive <-__netif_receive_skb
52870.115436: bond_handle_frame <-__netif_receive_skb
52870.115436: vlan_do_receive <-__netif_receive_skb
52870.115436: arp_rcv <-__netif_receive_skb
52870.115436: kfree_skb <-arp_rcv

Packet is dropped in arp_rcv() because its pkt_type was set to
PACKET_OTHERHOST in the first vlan_do_receive() call, since no eth0.103
exists.

We really need to change pkt_type only if no more rx_handler is about to
be called for the packet.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: NJiri Pirko <jpirko@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a32e4f9

24 10月, 2011 1 次提交

rtnetlink: Add missing manual netlink notification in dev_change_net_namespaces · d2237d35

由 Eric W. Biederman 提交于 10月 21, 2011

Renato Westphal noticed that since commit a2835763
"rtnetlink: handle rtnl_link netlink notifications manually" was merged
we no longer send a netlink message when a networking device is moved
from one network namespace to another.

Fix this by adding the missing manual notification in dev_change_net_namespaces.

Since all network devices that are processed by dev_change_net_namspaces are
in the initialized state the complicated tests that guard the manual
rtmsg_ifinfo calls in rollback_registered and register_netdevice are
unnecessary and we can just perform a plain notification.

Cc: stable@kernel.org
Tested-by: NRenato Westphal <renatowestphal@gmail.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2237d35

21 10月, 2011 1 次提交

dev: use name hash for dev_seq_ops · f04565dd

由 Mihai Maruseac 提交于 10月 20, 2011

Instead of using the dev->next chain and trying to resync at each call to
dev_seq_start, use the name hash, keeping the bucket and the offset in
seq->private field.

Tests revealed the following results for ifconfig > /dev/null
	* 1000 interfaces:
		* 0.114s without patch
		* 0.089s with patch
	* 3000 interfaces:
		* 0.489s without patch
		* 0.110s with patch
	* 5000 interfaces:
		* 1.363s without patch
		* 0.250s with patch
	* 128000 interfaces (other setup):
		* ~100s without patch
		* ~30s with patch
Signed-off-by: NMihai Maruseac <mmaruseac@ixiacom.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f04565dd

20 10月, 2011 2 次提交

net: validate HWTSTAMP ioctl parameters · 4dc360c5

由 Richard Cochran 提交于 10月 19, 2011

This patch adds a sanity check on the values provided by user space for
the hardware time stamping configuration. If the values lie outside of
the absolute limits, then the ioctl request will be denied.
Signed-off-by: NRichard Cochran <richard.cochran@omicron.at>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4dc360c5

net: Move rcu_barrier from rollback_registered_many to netdev_run_todo. · 850a545b

由 Eric W. Biederman 提交于 10月 13, 2011

This patch moves the rcu_barrier from rollback_registered_many
(inside the rtnl_lock) into netdev_run_todo (just outside the rtnl_lock).
This allows us to gain the full benefit of sychronize_net calling
synchronize_rcu_expedited when the rtnl_lock is held.

The rcu_barrier in rollback_registered_many was originally a synchronize_net
but was promoted to be a rcu_barrier() when it was found that people were
unnecessarily hitting the 250ms wait in netdev_wait_allrefs().  Changing
the rcu_barrier back to a synchronize_net is therefore safe.

Since we only care about waiting for the rcu callbacks before we get
to netdev_wait_allrefs() it is also safe to move the wait into
netdev_run_todo.

This was tested by creating and destroying 1000 tap devices and observing
/proc/lock_stat.  /proc/lock_stat reports this change reduces the hold
times of the rtnl_lock by a factor of 10.  There was no observable
difference in the amount of time it takes to destroy a network device.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

850a545b

19 10月, 2011 2 次提交

net: add skb frag size accessors · 9e903e08

由 Eric Dumazet 提交于 10月 18, 2011

To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.

Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e903e08

net: allow vlan traffic to be received under bond · 2425717b

由 John Fastabend 提交于 10月 10, 2011

The following configuration used to work as I expected. At least
we could use the fcoe interfaces to do MPIO and the bond0 iface
to do load balancing or failover.

       ---eth2.228-fcoe
       |
eth2 -----|
          |
          |---- bond0
          |
eth3 -----|
       |
       ---eth3.228-fcoe

This worked because of a change we added to allow inactive slaves
to rx 'exact' matches. This functionality was kept intact with the
rx_handler mechanism. However now the vlan interface attached to the
active slave never receives traffic because the bonding rx_handler
updates the skb->dev and goto's another_round. Previously, the
vlan_do_receive() logic was called before the bonding rx_handler.

Now by the time vlan_do_receive calls vlan_find_dev() the
skb->dev is set to bond0 and it is clear no vlan is attached
to this iface. The vlan lookup fails.

This patch moves the VLAN check above the rx_handler. A VLAN
tagged frame is now routed to the eth2.228-fcoe iface in the
above schematic. Untagged frames continue to the bond0 as
normal. This case also remains intact,

eth2 --> bond0 --> vlan.228

Here the skb is VLAN tagged but the vlan lookup fails on eth2
causing the bonding rx_handler to be called. On the second
pass the vlan lookup is on the bond0 iface and completes as
expected.

Putting a VLAN.228 on both the bond0 and eth2 device will
result in eth2.228 receiving the skb. I don't think this is
completely unexpected and was the result prior to the rx_handler
result.

Note, the same setup is also used for other storage traffic that
MPIO is used with eg. iSCSI and similar setups can be contrived
without storage protocols.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NJesse Gross <jesse@nicira.com>
Reviewed-by: NJiri Pirko <jpirko@redhat.com>
Tested-by: NHans Schillstrom <hams.schillstrom@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2425717b

04 10月, 2011 1 次提交

RPS: Ensure that an expired hardware filter can be re-added later · 09994d1b

由 Ben Hutchings 提交于 10月 03, 2011

Amir Vadai wrote:
> When a stream is paused, and its rule is expired while it is paused,
> no new rule will be configured to the HW when traffic resume.
[...]
> - When stream was resumed, traffic was steered again by RSS, and
> because current-cpu was equal to desired-cpu,  ndo_rx_flow_steer
> wasn't called and no rule was configured to the HW.

Fix this by setting the flow's current CPU only in the table for the
newly selected RX queue.
Reported-and-tested-by: NAmir Vadai <amirv@dev.mellanox.co.il>
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09994d1b

29 9月, 2011 1 次提交

net: rps: fix the support for PPPOE · 5dd17e08

由 Changli Gao 提交于 9月 20, 2011

The upper protocol numbers of PPPOE are different, and should be treated
specially.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5dd17e08

16 9月, 2011 2 次提交

net: consolidate and fix ethtool_ops->get_settings calling · 4bc71cb9

由 Jiri Pirko 提交于 9月 03, 2011

This patch does several things:
- introduces __ethtool_get_settings which is called from ethtool code and
  from drivers as well. Put ASSERT_RTNL there.
- dev_ethtool_get_settings() is replaced by __ethtool_get_settings()
- changes calling in drivers so rtnl locking is respected. In
  iboe_get_rate was previously ->get_settings() called unlocked. This
  fixes it. Also prb_calc_retire_blk_tmo() in af_packet.c had the same
  problem. Also fixed by calling __dev_get_by_index() instead of
  dev_get_by_index() and holding rtnl_lock for both calls.
- introduces rtnl_lock in bnx2fc_vport_create() and fcoe_vport_create()
  so bnx2fc_if_create() and fcoe_if_create() are called locked as they
  are from other places.
- use __ethtool_get_settings() in bonding code
Signed-off-by: NJiri Pirko <jpirko@redhat.com>

v2->v3:
	-removed dev_ethtool_get_settings()
	-added ASSERT_RTNL into __ethtool_get_settings()
	-prb_calc_retire_blk_tmo - use __dev_get_by_index() and lock
	 around it and __ethtool_get_settings() call
v1->v2:
        add missing export_symbol
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> [except FCoE bits]
Acked-by: NRalf Baechle <ralf@linux-mips.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4bc71cb9

net: copy userspace buffers on device forwarding · 48c83012

由 Michael S. Tsirkin 提交于 8月 31, 2011

dev_forward_skb loops an skb back into host networking
stack which might hang on the memory indefinitely.
In particular, this can happen in macvtap in bridged mode.
Copy the userspace fragments to avoid blocking the
sender in that case.

As this patch makes skb_copy_ubufs extern now,
I also added some documentation and made it clear
the SKBTX_DEV_ZEROCOPY flag automatically instead
of doing it in all callers. This can be made into a separate
patch if people feel it's worth it.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

48c83012

25 8月, 2011 2 次提交

net: convert core to skb paged frag APIs · ea2ab693

由 Ian Campbell 提交于 8月 22, 2011

Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: netdev@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea2ab693

rps: support IPIP encapsulation · ec5efe79

由 Eric Dumazet 提交于 8月 24, 2011

Skip IPIP header to get proper layer-4 information.

Like GRE tunnels, this only works if rxhash is not already provided by
the device itself (ethtool -K ethX rxhash off), to allow kernel compute
a software rxhash.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec5efe79

23 8月, 2011 2 次提交

dynamic_debug: make netdev_dbg() call __netdev_printk() · ffa10cb4

由 Jason Baron 提交于 8月 11, 2011

Previously, if dynamic debug was enabled netdev_dbg() was using
dynamic_dev_dbg() to print out the underlying msg. Fix this by making
sure netdev_dbg() uses __netdev_printk().

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NJason Baron <jbaron@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

ffa10cb4

net: vlan: goto another_round instead of calling __netif_receive_skb · 0dfe1782

由 Jiri Pirko 提交于 8月 22, 2011

Now, when vlan tag on untagged in non-accelerated path is stripped from
skb, headers are reset right away. Benefit from that and avoid calling
__netif_receive_skb recursivelly and just use another_round.
Signed-off-by: NJiri Pirko <jpirko@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0dfe1782

19 8月, 2011 2 次提交

net: rps: support PPPOE session messages · ae1511bf

由 Changli Gao 提交于 8月 18, 2011

Inspect the payload of PPPOE session messages for the 4 tuples to generate
skb->rxhash.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae1511bf

net: rps: support 802.1Q · 1ff1986f

由 Changli Gao 提交于 8月 18, 2011

For the 802.1Q packets, if the NIC doesn't support hw-accel-vlan-rx, RPS
won't inspect the internal 4 tuples to generate skb->rxhash, so this kind
of traffic can't get any benefit from RPS.

This patch adds the support for 802.1Q to RPS.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ff1986f

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功