提交 · f3591fd4c9881889dfa9203328a89316fcc834e1 · openeuler / raspberrypi-kernel

12 6月, 2014 40 次提交

由 David S. Miller 提交于 6月 11, 2014

Tom Herbert says:

====================
net: Checksum offload changes - Part IV

I am working on overhauling RX checksum offload. Goals of this effort
are:

- Specify what exactly it means when driver returns CHECKSUM_UNNECESSARY
- Preserve CHECKSUM_COMPLETE through encapsulation layers
- Don't do skb_checksum more than once per packet
- Unify GRO and non-GRO csum verification as much as possible
- Unify the checksum functions (checksum_init)
- Simply code

What is in this fourth patch set:

- Preserve CHECKSUM_COMPLETE instead of changing it to
  CHECKSUM_UNNECESSARY. This allows correct reuse in validating multiple
  csums in a packet.
- When SW needs to compute the packet checksum, save it as
  CHECKSUM_COMPLETE. Also mark that checksum was compute by SW.
- Add skb_gro_postpull_rcsum to udp and vxlan to make GRO work with
  CHECKSUM_COMPLETE.

v2: Removed patch setting skb_encapsulation when validating checksum
    in tcp_gro_receive

Please review carefully and test if possible, mucking with basic
checksum functions is always a little precarious :-)
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3591fd4

net: Add skb_gro_postpull_rcsum to udp and vxlan · 6bae1d4c

由 Tom Herbert 提交于 6月 10, 2014

Need to gro_postpull_rcsum for GRO to work with checksum complete.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6bae1d4c

net: Save software checksum complete · 7e3cead5

由 Tom Herbert 提交于 6月 10, 2014

In skb_checksum complete, if we need to compute the checksum for the
packet (via skb_checksum) save the result as CHECKSUM_COMPLETE.
Subsequent checksum verification can use this.

Also, added csum_complete_sw flag to distinguish between software and
hardware generated checksum complete, we should always be able to trust
the software computation.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e3cead5

net: Preserve CHECKSUM_COMPLETE at validation · 5d0c2b95

由 Tom Herbert 提交于 6月 10, 2014

Currently when the first checksum in a packet is validated using
CHECKSUM_COMPLETE, ip_summed is overwritten to be CHECKSUM_UNNECESSARY
so that any subsequent checksums in the packet are not correctly
validated.

This patch adds csum_valid flag in sk_buff and uses that to indicate
validated checksum instead of setting CHECKSUM_UNNECESSARY. The bit
is set accordingly in the skb_checksum_validate_* functions. The flag
is checked in skb_checksum_complete, so that validation is communicated
between checksum_init and checksum_complete sequence in TCP and UDP.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d0c2b95

Merge branch 'qlcnic-next' · 1054cc15

由 David S. Miller 提交于 6月 11, 2014

Shahed Shaikh says:

====================
This series contains an enhancement in the area of firmware minidump collection
and optimization of ring count validation function.

Please apply this series to net-next.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1054cc15

qlcnic: Update version to 5.3.60 · 038782d6

由 Shahed Shaikh 提交于 6月 11, 2014

Signed-off-by: NShahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

038782d6

qlcnic: Optimize ring count validations · 18e0d625

由 Shahed Shaikh 提交于 6月 11, 2014

- Check interrupt mode at the start of qlcnic_set_channels().
- Do not validate ring count if they are not going to change.
Signed-off-by: NShahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18e0d625

qlcnic: Pre-allocate DMA buffer used for minidump collection · 4da005cf

由 Shahed Shaikh 提交于 6月 11, 2014

Pre-allocate the physically contiguous DMA buffer used for
minidump collection at driver load time, rather than at
run time, to minimize allocation failures. Driver will allocate
the buffer at load time if PEX DMA support capability is indicated
by the adapter.
Signed-off-by: NShahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4da005cf

ip_vti: fix sparse warnings for VTI_ISVTI · efd0f11d

由 Dmitry Popov 提交于 6月 11, 2014

This patch fixes the following sparse warnings:

net/ipv4/ip_tunnel.c:245:53: warning: restricted __be16 degrades to integer
net/ipv4/ip_vti.c:321:19: warning: incorrect type in assignment (different base types)
net/ipv4/ip_vti.c:321:19: expected restricted __be16 [addressable] [assigned] [usertype] i_flags
net/ipv4/ip_vti.c:321:19: got int
net/ipv4/ip_vti.c:447:24: warning: incorrect type in assignment (different base types)
net/ipv4/ip_vti.c:447:24: expected restricted __be16 [usertype] i_flags
net/ipv4/ip_vti.c:447:24: got int

Since VTI_ISVTI is always used with ip_tunnel_parm->i_flags (which is __be16),
we can __force cast VTI_ISVTI to __be16 in header file.
Signed-off-by: NDmitry Popov <ixaphire@qrator.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

efd0f11d

drivers: net: davinci_cpdma: double free on error · 2f87208e

由 Dan Carpenter 提交于 6月 11, 2014

We recently change the kzalloc() to devm_kzalloc() so freeing "ctlr"
here could lead to a double free.

Fixes: e1943128 ('drivers: net: davinci_cpdma: Convert kzalloc() to devm_kzalloc().')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f87208e

amd-xgbe: unwind on error in xgbe_mdio_register() · 8fc908c3

由 Dan Carpenter 提交于 6月 11, 2014

There is a typo here so we return directly instead of unwinding.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NTom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fc908c3

mrf24j40: add device managed APIs · 0aaf43f5

由 Varka Bhadram 提交于 6月 11, 2014

adds the device managed APIs so that no need worry about
freeing the resources.
Signed-off-by: NVarka Bhadram <varkab@cdac.in>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0aaf43f5

ceph: remove bogus extern · f6479449

由 stephen hemminger 提交于 6月 10, 2014

Sparse complained about this bogus extern on definition of
a function.
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6479449

net: filter: document internal instruction encoding · 783e327b

由 Alexei Starovoitov 提交于 6月 10, 2014

This patch adds a description of eBPFs instruction encoding in order
to bring the documentation in line with the implementation.
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

783e327b

net: filter: mention eBPF terminology as well · e4ad4032

由 Alexei Starovoitov 提交于 6月 10, 2014

Since the term eBPF is used anyway on mailing list discussions, lets
also document that in the main BPF documentation file and replace a
couple of occurrences with eBPF terminology to be more clear.
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e4ad4032

ipv4: fix a race in ip4_datagram_release_cb() · 9709674e

由 Eric Dumazet 提交于 6月 10, 2014

Alexey gave a AddressSanitizer[1] report that finally gave a good hint
at where was the origin of various problems already reported by Dormando
in the past [2]

Problem comes from the fact that UDP can have a lockless TX path, and
concurrent threads can manipulate sk_dst_cache, while another thread,
is holding socket lock and calls __sk_dst_set() in
ip4_datagram_release_cb() (this was added in linux-3.8)

It seems that all we need to do is to use sk_dst_check() and
sk_dst_set() so that all the writers hold same spinlock
(sk->sk_dst_lock) to prevent corruptions.

TCP stack do not need this protection, as all sk_dst_cache writers hold
the socket lock.

[1]
https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel

AddressSanitizer: heap-use-after-free in ipv4_dst_check
Read of size 2 by thread T15453:
 [<ffffffff817daa3a>] ipv4_dst_check+0x1a/0x90 ./net/ipv4/route.c:1116
 [<ffffffff8175b789>] __sk_dst_check+0x89/0xe0 ./net/core/sock.c:531
 [<ffffffff81830a36>] ip4_datagram_release_cb+0x46/0x390 ??:0
 [<ffffffff8175eaea>] release_sock+0x17a/0x230 ./net/core/sock.c:2413
 [<ffffffff81830882>] ip4_datagram_connect+0x462/0x5d0 ??:0
 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

Freed by thread T15455:
 [<ffffffff8178d9b8>] dst_destroy+0xa8/0x160 ./net/core/dst.c:251
 [<ffffffff8178de25>] dst_release+0x45/0x80 ./net/core/dst.c:280
 [<ffffffff818304c1>] ip4_datagram_connect+0xa1/0x5d0 ??:0
 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

Allocated by thread T15453:
 [<ffffffff8178d291>] dst_alloc+0x81/0x2b0 ./net/core/dst.c:171
 [<ffffffff817db3b7>] rt_dst_alloc+0x47/0x50 ./net/ipv4/route.c:1406
 [<     inlined    >] __ip_route_output_key+0x3e8/0xf70
__mkroute_output ./net/ipv4/route.c:1939
 [<ffffffff817dde08>] __ip_route_output_key+0x3e8/0xf70 ./net/ipv4/route.c:2161
 [<ffffffff817deb34>] ip_route_output_flow+0x14/0x30 ./net/ipv4/route.c:2249
 [<ffffffff81830737>] ip4_datagram_connect+0x317/0x5d0 ??:0
 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

[2]
<4>[196727.311203] general protection fault: 0000 [#1] SMP
<4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio
<4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
<4>[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
<4>[196727.311364] task: ffff885e6f069700 ti: ffff885e6f072000 task.ti: ffff885e6f072000
<4>[196727.311377] RIP: 0010:[<ffffffff815f8c7f>]  [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80
<4>[196727.311399] RSP: 0018:ffff885effd23a70  EFLAGS: 00010282
<4>[196727.311409] RAX: dead000000200200 RBX: ffff8854c398ecc0 RCX: 0000000000000040
<4>[196727.311423] RDX: dead000000100100 RSI: dead000000100100 RDI: dead000000200200
<4>[196727.311437] RBP: ffff885effd23a80 R08: ffffffff815fd9e0 R09: ffff885d5a590800
<4>[196727.311451] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
<4>[196727.311464] R13: ffffffff81c8c280 R14: 0000000000000000 R15: ffff880e85ee16ce
<4>[196727.311510] FS:  0000000000000000(0000) GS:ffff885effd20000(0000) knlGS:0000000000000000
<4>[196727.311554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[196727.311581] CR2: 00007a46751eb000 CR3: 0000005e65688000 CR4: 00000000000407e0
<4>[196727.311625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[196727.311669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[196727.311713] Stack:
<4>[196727.311733]  ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42
<4>[196727.311784]  ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0
<4>[196727.311834]  ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0
<4>[196727.311885] Call Trace:
<4>[196727.311907]  <IRQ>
<4>[196727.311912]  [<ffffffff815b7f42>] dst_destroy+0x32/0xe0
<4>[196727.311959]  [<ffffffff815b86c6>] dst_release+0x56/0x80
<4>[196727.311986]  [<ffffffff81620bd5>] tcp_v4_do_rcv+0x2a5/0x4a0
<4>[196727.312013]  [<ffffffff81622b5a>] tcp_v4_rcv+0x7da/0x820
<4>[196727.312041]  [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360
<4>[196727.312070]  [<ffffffff815de02d>] ? nf_hook_slow+0x7d/0x150
<4>[196727.312097]  [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360
<4>[196727.312125]  [<ffffffff815fda92>] ip_local_deliver_finish+0xb2/0x230
<4>[196727.312154]  [<ffffffff815fdd9a>] ip_local_deliver+0x4a/0x90
<4>[196727.312183]  [<ffffffff815fd799>] ip_rcv_finish+0x119/0x360
<4>[196727.312212]  [<ffffffff815fe00b>] ip_rcv+0x22b/0x340
<4>[196727.312242]  [<ffffffffa0339680>] ? macvlan_broadcast+0x160/0x160 [macvlan]
<4>[196727.312275]  [<ffffffff815b0c62>] __netif_receive_skb_core+0x512/0x640
<4>[196727.312308]  [<ffffffff811427fb>] ? kmem_cache_alloc+0x13b/0x150
<4>[196727.312338]  [<ffffffff815b0db1>] __netif_receive_skb+0x21/0x70
<4>[196727.312368]  [<ffffffff815b0fa1>] netif_receive_skb+0x31/0xa0
<4>[196727.312397]  [<ffffffff815b1ae8>] napi_gro_receive+0xe8/0x140
<4>[196727.312433]  [<ffffffffa00274f1>] ixgbe_poll+0x551/0x11f0 [ixgbe]
<4>[196727.312463]  [<ffffffff815fe00b>] ? ip_rcv+0x22b/0x340
<4>[196727.312491]  [<ffffffff815b1691>] net_rx_action+0x111/0x210
<4>[196727.312521]  [<ffffffff815b0db1>] ? __netif_receive_skb+0x21/0x70
<4>[196727.312552]  [<ffffffff810519d0>] __do_softirq+0xd0/0x270
<4>[196727.312583]  [<ffffffff816cef3c>] call_softirq+0x1c/0x30
<4>[196727.312613]  [<ffffffff81004205>] do_softirq+0x55/0x90
<4>[196727.312640]  [<ffffffff81051c85>] irq_exit+0x55/0x60
<4>[196727.312668]  [<ffffffff816cf5c3>] do_IRQ+0x63/0xe0
<4>[196727.312696]  [<ffffffff816c5aaa>] common_interrupt+0x6a/0x6a
<4>[196727.312722]  <EOI>
<1>[196727.313071] RIP  [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80
<4>[196727.313100]  RSP <ffff885effd23a70>
<4>[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]---
<0>[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt
Reported-by: NAlexey Preobrazhensky <preobr@google.com>
Reported-by: Ndormando <dormando@rydia.ne>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: 8141ed9f ("ipv4: Add a socket release callback for datagram sockets")
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9709674e

net: filter: add test_bpf module under MAINTAINERS' networking section · a101ccd1

由 Daniel Borkmann 提交于 6月 10, 2014

Add lib/test_bpf.c entry to maintainers file under networking.
All changes were posted via netdev for review, so make sure
other people Cc it as well when they call get_maintainer.pl.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a101ccd1

net: add __pskb_copy_fclone and pskb_copy_for_clone · bad93e9d

由 Octavian Purdila 提交于 6月 12, 2014

There are several instances where a pskb_copy or __pskb_copy is
immediately followed by an skb_clone.

Add a couple of new functions to allow the copy skb to be allocated
from the fclone cache and thus speed up subsequent skb_clone calls.

Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <antonio@meshcoding.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: Arvid Brodin <arvid.brodin@alten.se>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bad93e9d

Merge branch 'bridge-next' · 1a0b20b2

由 David S. Miller 提交于 6月 11, 2014

Toshiaki Makita says:

====================
bridge: 802.1ad vlan protocol support

Currently bridge vlan filtering doesn't work fine with 802.1ad protocol.
Only if a bridge is configured without pvid, the bridge receives only
802.1ad tagged frames and no STP is used, it will work.
Otherwise:
- If pvid is configured, it can put only 802.1Q tags but cannot put 802.1ad
  tags.
- If 802.1Q and 802.1ad tagged frames arrive in mixture, it applies filtering
  regardless of their protocols.
- While an 802.1ad bridge should use another mac address for STP BPDU and
  should forward customer's BPDU frames, it can't.
Thus, we can't properly handle frames once 802.1ad is used.

Handling 802.1ad is useful if we want to allow stacked vlans to be used,
e.g., guest VMs wants to use vlan tags and the host also wants to segregate
guest's traffic from other guests' by vlan tags.

Here is the image describing how to configure a bridge to filter VMs traffic.

         +-------+p/u   +-----+  +---------+
 +----+  |       |------|vnet0|--|User A VM|
 |eth0|--|802.1ad|      +-----+  +---------+
 +----+  |bridge |p/u   +-----+  +---------+
         |       |------|vnet1|--|User B VM|
         +-------+      +-----+  +---------+
p/u: pvid/untagged

This patch set enables us to set vlan protocols per bridge.
This tries to implement a bridge like S-VLAN component in IEEE 802.1Q-2011
spec.

Note that there is another possible implementation that sets vlan protocols
per port. Some HW switches seem to take that approach.
However, I think per-bridge approach is better, because;
- I think the typical usage of an 802.1ad bridge is segregating 802.1Q tagged
  traffic (like what is described above), and this doesn't need the ability to
  be set protocols per port. Also, If a bridge has many ports and it supports
  per-port setting, we might have to make much more extra configurations to
  change protocols of all ports.

- I assume that the main perpose to set protocol per port is to assign S-VID
  according to C-VID, or to realize two logical bridges (one is an 802.1Q
  filtering bridge and the other is an 802.1ad filtering bridge) in one bridge.
  The former usually needs additional features such as vlan id mapping, and
  is likely to make bridge's code complicated. If a user wants, such enhanced
  features can be accomplished by a combination of multiple bridges, so it is
  not absolutely necessary to implement these features in a bridge itself.
  The latter is simply unnecessary because we can easily make two bridges of
  which one is an 802.1Q bridge and the other is an 802.1ad bridge.

Here is an example of the enhanced feature that we can realize by using
multiple bridges and veth interfaces. This way is documented in
IEEE 802.1Q-2011 clause 15.4 (C-tagged service interface).

 +----+  +-------+p/u         +------+  +----+  +--+
 |eth0|--|802.1ad|----veth----|802.1Q|--|vnet|--|VM|
 +----+  |bridge |----veth----|bridge|  +----+  +--+
         +-------+p/u         +------+
p/u: pvid/untagged

In this configuration, we can map C-VIDs to any S-VID.
For example;
 C-VID 10 and 20 to S-VID 100
 C-VID 30 to S-VID 110
This is achieved through the 802.1Q bridge that forwards C-tagged frames to
proper ports of the 802.1ad bridge.

Changes:
v1 -> v2:
- Make the way to forward bridge group addresses more generic by introducing
  new mask, group_fwd_mask_required.

RFC -> v1:
- Add S-TAG tx offload.
- Remove a fix around stacked vlan which has already been fixed.
- Take into account Bridge Group Addresses.
- Separate handling of protocol-mismatch from br_vlan_get_tag().
- Change the way to set vlan_proto from netlink to sysfs because no other
  existing configuration per bridge can be set by netlink.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a0b20b2

bridge: Support 802.1ad vlan filtering · 204177f3

由 Toshiaki Makita 提交于 6月 10, 2014

This enables us to change the vlan protocol for vlan filtering.
We come to be able to filter frames on the basis of 802.1ad vlan tags
through a bridge.

This also changes br->group_addr if it has not been set by user.
This is needed for an 802.1ad bridge.
(See IEEE 802.1Q-2011 8.13.5.)

Furthermore, this sets br->group_fwd_mask_required so that an 802.1ad
bridge can forward the Nearest Customer Bridge group addresses except
for br->group_addr, which should be passed to higher layer.

To change the vlan protocol, write a protocol in sysfs:
# echo 0x88a8 > /sys/class/net/br0/bridge/vlan_protocol
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

204177f3

bridge: Prepare for forwarding another bridge group addresses · f2808d22

由 Toshiaki Makita 提交于 6月 10, 2014

If a bridge is an 802.1ad bridge, it must forward another bridge group
addresses (the Nearest Customer Bridge group addresses).
(For details, see IEEE 802.1Q-2011 8.6.3.)

As user might not want group_fwd_mask to be modified by enabling 802.1ad,
introduce a new mask, group_fwd_mask_required, which indicates addresses
the bridge wants to forward. This will be set by enabling 802.1ad.
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2808d22

bridge: Prepare for 802.1ad vlan filtering support · 8580e211

由 Toshiaki Makita 提交于 6月 10, 2014

This enables a bridge to have vlan protocol informantion and allows vlan
tag manipulation (retrieve, insert and remove tags) according to the vlan
protocol.
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8580e211

bridge: Add 802.1ad tx vlan acceleration · 1c5abb6c

由 Toshiaki Makita 提交于 6月 10, 2014

Bridge device doesn't need to embed S-tag into skb->data.
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c5abb6c

net: xen-netback: include linux/vmalloc.h again · e7b599d7

由 Arnd Bergmann 提交于 6月 10, 2014

commit e9ce7cb6 ("xen-netback: Factor queue-specific data into
queue struct") added a use of vzalloc/vfree to interface.c, but
removed the #include <linux/vmalloc.h> statement at the same time,
which causes this build error:

drivers/net/xen-netback/interface.c: In function 'xenvif_free':
drivers/net/xen-netback/interface.c:754:2: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]
  vfree(vif->queues);
  ^
cc1: some warnings being treated as errors
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew J. Bennieston <andrew.bennieston@citrix.com>
Acked-by: NWei Liu <wei.liu2@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7b599d7

net: phy: realtek: register/unregister multiple drivers properly · 71b9c4a8

由 Jongsung Kim 提交于 6月 10, 2014

Using phy_drivers_register/_unregister functions is proper way to
handle multiple PHY drivers registration. For Realtek PHY drivers
module, it fixes incomplete current error-handlings up and adds
missed unregistration for the RTL8201CP driver.
Signed-off-by: NJongsung Kim <neidhard.kim@lge.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71b9c4a8

net: sh_eth: Fix timing of RACT setting in sh_eth_rx() · 1b72a0fc

由 Yoshihiro Shimoda 提交于 6月 10, 2014

This patch fixes an issue that we cannot use nfs rootfs correctly
on r8a7790 when the command below runs on a host PC.

 $ sudo ping -f -l 8 $BOARD_IP_ADDR

Since the driver sets the RACT to 1 in the first while loop of
sh_eth_rx(), the controller accepts a next frame into the next RX
descriptor during the while loop. But, in the first while loop
doesn't allocate a next skb. So, this patch removes the RACT setting
in the first while loop of sh_eth_rx().
Signed-off-by: NYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b72a0fc

net: sh_eth: Fix receive packet "exceeded" condition in sh_eth_rx() · 4f809cea

由 Yoshihiro Shimoda 提交于 6月 10, 2014

This patch fixes the packet "exceeded" condition in sh_eth_rx() when
RACT in an RX descriptor is not set and the "quota" is 0.
Otherwise, kernel panic happens because the "&n->poll_list" is deleted
twice in sh_eth_poll() which calls napi_complete() and net_rx_action().
Signed-off-by: NKouei Abe <kouei.abe.cp@renesas.com>
Signed-off-by: NYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f809cea

net: filter: fix warning on 32-bit arch · 61f83d0d

由 Alexei Starovoitov 提交于 6月 11, 2014

fix compiler warning on 32-bit architectures:

net/core/filter.c: In function '__sk_run_filter':
net/core/filter.c:540:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
net/core/filter.c:550:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
net/core/filter.c:560:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61f83d0d

tipc: fix potential bug in function tipc_backlog_rcv · 02c00c2a

由 Jon Paul Maloy 提交于 6月 09, 2014

In commit 4f4482dc ("tipc: compensate
for double accounting in socket rcv buffer") we access 'truesize' of
a received buffer after it might have been released by the function
filter_rcv().

In this commit we correct this by reading the value of 'truesize' to
the stack before delivering the buffer to filter_rcv().
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02c00c2a

net: sxgbe: remove duplicate SXGBE_CORE_L34_ADDCTL_REG define · cf97b8ff

由 Dan Carpenter 提交于 6月 09, 2014

The SXGBE_CORE_L34_ADDCTL_REG define is cut and pasted twice so we can
delete the second instance.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf97b8ff

qlcnic: remove duplicate QLC_83XX_GET_LSO_CAPABILITY define · 5e3ec11b

由 Dan Carpenter 提交于 6月 09, 2014

The QLC_83XX_GET_LSO_CAPABILITY define is cut and pasted twice so we can
delete the second instance.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NSony Chacko <sony.chacko@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e3ec11b

Merge branch 'mlx4' · 9b07d735

由 David S. Miller 提交于 6月 11, 2014

Amir Vadai says:

====================
cpumask,net: affinity hint helper function

This patchset will set affinity hint to influence IRQs to be allocated on the
same NUMA node as the one where the card resides. As discussed in
http://www.spinics.net/lists/netdev/msg271497.html

If number of IRQs allocated is greater than the number of local NUMA cores, all
local cores will be used first, and the rest of the IRQs will be on a remote
NUMA node.
If no NUMA support - IRQ's and cores will be mapped 1:1

Since the utility function to calculate the mapping could be useful in other mq
drivers in the kernel, it was added to cpumask.[ch]

This patchset was tested and applied on top of net-next since the first
consumer is a network device (mlx4_en).  Over commit fff1f59b "mac802154:
llsec: add forgotten list_del_rcu in key removal"
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b07d735

net/mlx4_en: Use affinity hint · 9e311e77

由 Yuval Atias 提交于 6月 09, 2014

The “affinity hint” mechanism is used by the user space
daemon, irqbalancer, to indicate a preferred CPU mask for irqs.
Irqbalancer can use this hint to balance the irqs between the
cpus indicated by the mask.

We wish the HCA to preferentially map the IRQs it uses to numa cores
close to it.  To accomplish this, we use cpumask_set_cpu_local_first(), that
sets the affinity hint according the following policy:
First it maps IRQs to “close” numa cores.  If these are exhausted, the
remaining IRQs are mapped to “far” numa cores.
Signed-off-by: NYuval Atias <yuvala@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e311e77

cpumask: Utility function to set n'th cpu - local cpu first · da91309e

由 Amir Vadai 提交于 6月 09, 2014

This function sets the n'th cpu - local cpu's first.
For example: in a 16 cores server with even cpu's local, will get the
following values:
cpumask_set_cpu_local_first(0, numa, cpumask) => cpu 0 is set
cpumask_set_cpu_local_first(1, numa, cpumask) => cpu 2 is set
...
cpumask_set_cpu_local_first(7, numa, cpumask) => cpu 14 is set
cpumask_set_cpu_local_first(8, numa, cpumask) => cpu 1 is set
cpumask_set_cpu_local_first(9, numa, cpumask) => cpu 3 is set
...
cpumask_set_cpu_local_first(15, numa, cpumask) => cpu 15 is set

Curently this function will be used by multi queue networking devices to
calculate the irq affinity mask, such that as many local cpu's as
possible will be utilized to handle the mq device irq's.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da91309e

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · d4f38620

由 David S. Miller 提交于 6月 11, 2014

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2014-06-11

This series contains updates to igb, i40e and i40evf.

Todd makes a change to igb to un-hide invariant returns by getting rid of
the E1000_SUCCESS define and converting those returns to return 0.

Jacob separates the hardware logic from the set function, so that we can
re-use it during a ptp_reset in igb.  This enables the reset to return
functionality to the last know timestamp mode, rather than resetting the
value.

Ashish implements context flags for headwb and headwb_addr so that we
do not have to keep them always enabled.

Shannon updates the admin queue API for the new firmware, which adds
set_pf_content, nvm_config_read/write, replaces set_phy_reset with
set_phy_debug and removes nvm_read/write_reg_se.  Cleans up the driver
to use the stored base_queue value since there is no need to read the
PCI register for the PF's base queue on every single transmit queue
enable and disable as we already have the value stored from reading
the capability features at startup.

Anjali changes the notion of source and destination for FD_SB in ethtool
to align i40e with other drivers.  Adds flow director statistics to
the PF stats.  Fixes a bug in ethtool for flow director drop packet
filter where the drop action comes down as a ring_cookie value, so allow
it as a special value that can be used to configure destination control.

Mitch fixes the i40evf to keep the driver from going down when it is
already in a down state.  This prevents a CPU soft lock in napi_disable().
Also change the i40evf to check the admin queue error bits since the
firmware can indicate any admin queue error states to the driver via
some bits in the length registers.

Neerav separates out the DCB capability and enabled flags because currently
if the firmware reports DCB capability the driver enables
I40E_FLAG_DCB_ENABLED flag.  When this flag is enabled the driver inserts
a tag when transmitting a packet from the port even if there are no DCB
traffic classes configured at the port.  So by adding the additional flag,
I40E_FLAG_DCB_CAPABLE, that will be set when the DCB capability is present
and the existing enabled flag will only be set if there are more than one
traffic classes configured at the port.

Greg fixes the i40e driver to not automatically accept tagged packets by
default so that the system must request a VLAN tag packet filter to get
packets with that tag.  Greg also converts i40e to use the in-kernel
ether_addr_copy() instead of mempcy().

Jesse removes the FTYPE field from the receive descriptor to match the
hardware implementation.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4f38620

Merge branch 'sctp-next' · 813ebbbf

由 David S. Miller 提交于 6月 11, 2014

Daniel Borkmann says:

====================
SCTP update

This set contains transport path selection improvements in
SCTP. Please see individual patches for details.
====================
Acked-by: NVlad Yasevich <vyasevich@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

813ebbbf

net: sctp: fix incorrect type in gfp initializer · 9b87d465

由 Daniel Borkmann 提交于 6月 11, 2014

This fixes the following sparse warning:

net/sctp/associola.c:1556:29: warning: incorrect type in initializer (different base types)
net/sctp/associola.c:1556:29: expected bool [unsigned] [usertype] preload
net/sctp/associola.c:1556:29: got restricted gfp_t
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b87d465

net: sctp: improve sctp_select_active_and_retran_path selection · a7288c4d

由 Daniel Borkmann 提交于 6月 11, 2014

In function sctp_select_active_and_retran_path(), we walk the
transport list in order to look for the two most recently used
ACTIVE transports (trans_pri, trans_sec). In case we didn't find
anything ACTIVE, we currently just camp on a possibly PF or
INACTIVE transport that is primary path; this behavior actually
dates back to linux-history tree of the very early days of
lksctp, and can yield a behavior that chooses suboptimal
transport paths.

Instead, be a bit more clever by reusing and extending the
recently introduced sctp_trans_elect_best() handler. In case
both transports are evaluated to have the same score resulting
from their states, break the tie by looking at: 1) transport
patch error count 2) last_time_heard value from each transport.

This is analogous to Nishida's Quick Failover draft [1],
section 5.1, 3:

  The sender SHOULD avoid data transmission to PF destinations.
  When all destinations are in either PF or Inactive state,
  the sender MAY either move the destination from PF to active
  state (and transmit data to the active destination) or the
  sender MAY transmit data to a PF destination. In the former
  scenario, (i) the sender MUST NOT notify the ULP about the
  state transition, and (ii) MUST NOT clear the destination's
  error counter. It is recommended that the sender picks the
  PF destination with least error count (fewest consecutive
  timeouts) for data transmission. In case of a tie (multiple PF
  destinations with same error count), the sender MAY choose the
  last active destination.

Thus for sctp_select_active_and_retran_path(), we keep track of
the best, if any, transport that is in PF state and in case no
ACTIVE transport has been found (hence trans_{pri,sec} is NULL),
we select the best out of the three: current primary_path and
retran_path as well as a possible PF transport.

The secondary may still camp on the original primary_path as
before. The change in sctp_trans_elect_best() with a more fine
grained tie selection also improves at the same time path selection
for sctp_assoc_update_retran_path() in case of non-ACTIVE states.

  [1] http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7288c4d

net: sctp: migrate most recently used transport to ktime · e575235f

由 Daniel Borkmann 提交于 6月 11, 2014

Be more precise in transport path selection and use ktime
helpers instead of jiffies to compare and pick the better
primary and secondary recently used transports. This also
avoids any side-effects during a possible roll-over, and
could lead to better path decision-making.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e575235f

net: sctp: refactor active path selection · b82e8f31

由 Daniel Borkmann 提交于 6月 11, 2014

This patch just refactors and moves the code for the active
path selection into its own helper function outside of
sctp_assoc_control_transport() which is already big enough.
No functional changes here.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b82e8f31