提交 · 60ea5f83cddf538a4509f2214ffd50d8d69952a5 · openeuler / raspberrypi-kernel

18 1月, 2014 27 次提交

由 Jesse Brandeburg 提交于 1月 17, 2014

The FLAG_FDIR_* defines can be renamed to be more descriptive.
This patch is in preparation for the following where the fdir
code is refactored.
Signed-off-by: NAnjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NKavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60ea5f83

i40e: whitespace fixes · 8fb905b3

由 Jesse Brandeburg 提交于 1月 17, 2014

Fix more whitespace issues, including making some locals declared
in a nicer order.

Also update Copyright string printed when the driver loads.
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NKavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fb905b3

i40e: Change firmware workaround · d0b10249

由 Jesse Brandeburg 提交于 1月 17, 2014

Remove a workaround that is no longer necessary and implement
a better understanding of what firmware is returning in the MSI-X
vector count.  This makes it so that the driver ends up with the
right amount of queues when using all available MSI-X vectors.

Change-ID: I34e60cc71dcfb1b5412f37df956fedcc49ade187
Signed-off-by: NCatherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NKavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d0b10249

i40e: fix compile warning on checksum_local · e15c9fa0

由 Jesse Brandeburg 提交于 1月 17, 2014

Compile testing with higher warning levels found this complaint:
i40e_nvm.c: warning: 'checksum_local' may be used uninitialized in
this function
Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NKavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e15c9fa0

Bluetooth: remove direct compilation of 6lowpan_iphc.c · 7bbc084c

由 Stephen Warren 提交于 1月 17, 2014

It's now built as a separate utility module, and enabling BT selects
that module in Kconfig. This fixes:

net/ieee802154/built-in.o:(___ksymtab_gpl+lowpan_process_data+0x0): multiple definition of `__ksymtab_lowpan_process_data'
net/bluetooth/built-in.o:(___ksymtab_gpl+lowpan_process_data+0x0): first defined here
net/ieee802154/built-in.o:(___ksymtab_gpl+lowpan_header_compress+0x0): multiple definition of `__ksymtab_lowpan_header_compress'
net/bluetooth/built-in.o:(___ksymtab_gpl+lowpan_header_compress+0x0): first defined here
net/ieee802154/built-in.o: In function `lowpan_header_compress':
net/ieee802154/6lowpan_iphc.c:606: multiple definition of `lowpan_header_compress'
net/bluetooth/built-in.o:/home/swarren/shared/git_wa/kernel/kernel.git/net/bluetooth/../ieee802154/6lowpan_iphc.c:606: first defined here
net/ieee802154/built-in.o: In function `lowpan_process_data':
net/ieee802154/6lowpan_iphc.c:344: multiple definition of `lowpan_process_data'
net/bluetooth/built-in.o:/home/swarren/shared/git_wa/kernel/kernel.git/net/bluetooth/../ieee802154/6lowpan_iphc.c:344: first defined here
make[1]: *** [net/built-in.o] Error 1

(this change probably simply wasn't "git add"d to a53d34c3)

Fixes: a53d34c3 ("net: move 6lowpan compression code to separate module")
Fixes: 18722c24 ("Bluetooth: Enable 6LoWPAN support for BT LE devices")
Signed-off-by: NStephen Warren <swarren@nvidia.com>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7bbc084c

virtio-net: fix build error when CONFIG_AVERAGE is not enabled · 8cabd6a0

由 Michael Dalton 提交于 1月 17, 2014

Commit ab7db917 ("virtio-net: auto-tune mergeable rx buffer size for
improved performance") introduced a virtio-net dependency on EWMA.
The inclusion of EWMA is controlled by CONFIG_AVERAGE. Fix build error
when CONFIG_AVERAGE is not enabled by adding select AVERAGE to
virtio-net's Kconfig entry.

Build failure reported using config make ARCH=s390 defconfig.
Signed-off-by: NMichael Dalton <mwdalton@google.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8cabd6a0

Merge branch 'ixgbe' · d7c35c38

由 David S. Miller 提交于 1月 17, 2014

Aaron Brown says:

====================
Intel Wired LAN Driver Updates

This series contains an updates to ixgbe and ixgbevf.

Jacob add braces around some ixgbe_qv_lock_* calls lto better adhere
to Kernel style guidelines.  Don bumps the versions on ixgbe and
ixgbevf to match internal driver functionality better.
====================
Reviewed-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d7c35c38

ixgbevf: bump version · 86f359f6

由 Don Skidmore 提交于 1月 17, 2014

Bump the version number to better match functionality provided with out of
tree driver of the same version.
Signed-off-by: NDon Skidmore <donald.c.skidmore@intel.com>
Tested-by: NPhil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86f359f6

ixgbe: bump version number · f341c4e0

由 Don Skidmore 提交于 1月 17, 2014

Bump the version number to better match functionality provided with out
of tree driver of the same version.
Signed-off-by: NDon Skidmore <donald.c.skidmore@intel.com>
Tested-by: NPhil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f341c4e0

ixgbe: add braces around else condition in ixgbe_qv_lock_* calls · 78d820e8

由 Jacob Keller 提交于 1月 17, 2014

This patch adds braces around the ixgbe_qv_lock_* calls which previously only
had braces around the if portion. Kernel style guidelines for this require
parenthesis around all conditions if they are required around one. In addition
the comment while not illegal C syntax makes the code look wrong at a cursory
glance. This patch corrects the style and adds braces so that the full if-else
block is uniform.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NPhil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

78d820e8

net: ftgmac100: use kfree_skb() where appropriate · 0113e34b

由 Eric Dumazet 提交于 1月 16, 2014

In order to get correct drop monitor notifications for dropped
packets, we should call kfree_skb() instead of dev_kfree_skb()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0113e34b

Merge branch 'bonding_slave_sysfs' · ff1a9d35

由 David S. Miller 提交于 1月 17, 2014

Scott Feldman says:

====================
bonding: add slave netlink and sysfs support

v2:

  - Address review comment from Ding (and Veacesiav): handle kobj cleanup
    if sysfs_create_file() fails when adding slave attribute nodes.

v1:

  The following series adds bonding slave netlink and sysfs interfaces.
  Slave interfaces get a new IFLA_SLAVE set of netlink attributes, along
  with RTM_NEWLINK notification when slave's active status changes.  The
  sysfs interface adds read-only nodes for slave attributes under a /slave
  dir, simliar to how bond interfaces get a /bonding dir for bonding
  attributes.
====================
Reviewed-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff1a9d35

bonding: add netlink attributes to slave link dev · 1d3ee88a

由 sfeldma@cumulusnetworks.com 提交于 1月 16, 2014

If link is IFF_SLAVE, extend link dev netlink attributes to include
slave attributes with new IFLA_SLAVE nest.  Add netlink notification
(RTM_NEWLINK) when slave status changes from backup to active, or
visa-versa.

Adds new ndo_get_slave op to net_device_ops to fill skb with IFLA_SLAVE
attributes.  Currently only used by bonding driver, but could be
used by other aggregating devices with slaves.
Signed-off-by: NScott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d3ee88a

bonding: add sysfs /slave dir for bond slave devices. · 07699f9a

由 sfeldma@cumulusnetworks.com 提交于 1月 16, 2014

Add sub-directory under /sys/class/net/<interface>/slave with
read-only attributes for slave.  Directory only appears when
<interface> is a slave.

$ tree /sys/class/net/eth2/slave/
/sys/class/net/eth2/slave/
├── ad_aggregator_id
├── link_failure_count
├── mii_status
├── perm_hwaddr
├── queue_id
└── state

$ cat /sys/class/net/eth2/slave/*
2
0
up
40:02:10:ef:06:01
0
active
Signed-off-by: NScott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07699f9a

net: vxlan: do not use vxlan_net before checking event type · fd27e0d4

由 Daniel Borkmann 提交于 1月 17, 2014

Jesse Brandeburg reported that commit acaf4e70 caused a panic
when adding a network namespace while vxlan module was present in
the system:

[<ffffffff814d0865>] vxlan_lowerdev_event+0xf5/0x100
[<ffffffff816e9e5d>] notifier_call_chain+0x4d/0x70
[<ffffffff810912be>] __raw_notifier_call_chain+0xe/0x10
[<ffffffff810912d6>] raw_notifier_call_chain+0x16/0x20
[<ffffffff815d9610>] call_netdevice_notifiers_info+0x40/0x70
[<ffffffff815d9656>] call_netdevice_notifiers+0x16/0x20
[<ffffffff815e1bce>] register_netdevice+0x1be/0x3a0
[<ffffffff815e1dce>] register_netdev+0x1e/0x30
[<ffffffff814cb94a>] loopback_net_init+0x4a/0xb0
[<ffffffffa016ed6e>] ? lockd_init_net+0x6e/0xb0 [lockd]
[<ffffffff815d6bac>] ops_init+0x4c/0x150
[<ffffffff815d6d23>] setup_net+0x73/0x110
[<ffffffff815d725b>] copy_net_ns+0x7b/0x100
[<ffffffff81090e11>] create_new_namespaces+0x101/0x1b0
[<ffffffff81090f45>] copy_namespaces+0x85/0xb0
[<ffffffff810693d5>] copy_process.part.26+0x935/0x1500
[<ffffffff811d5186>] ? mntput+0x26/0x40
[<ffffffff8106a15c>] do_fork+0xbc/0x2e0
[<ffffffff811b7f2e>] ? ____fput+0xe/0x10
[<ffffffff81089c5c>] ? task_work_run+0xac/0xe0
[<ffffffff8106a406>] SyS_clone+0x16/0x20
[<ffffffff816ee689>] stub_clone+0x69/0x90
[<ffffffff816ee329>] ? system_call_fastpath+0x16/0x1b

Apparently loopback device is being registered first and thus we
receive an event notification when vxlan_net is not ready. Hence,
when we call net_generic() and request vxlan_net_id, we seem to
access garbage at that point in time. In setup_net() where we set
up a newly allocated network namespace, we traverse the list of
pernet ops ...

list_for_each_entry(ops, &pernet_list, list) {
	error = ops_init(ops, net);
	if (error < 0)
		goto out_undo;
}

... and loopback_net_init() is invoked first here, so in the middle
of setup_net() we get this notification in vxlan. As currently we
only care about devices that unregister, move access through
net_generic() there. Fix is based on Cong Wang's proposal, but
only changes what is needed here. It sucks a bit as we only work
around the actual cure: right now it seems the only way to check if
a netns actually finished traversing all init ops would be to check
if it's part of net_namespace_list. But that I find quite expensive
each time we go through a notifier callback. Anyway, did a couple
of tests and it seems good for now.

Fixes: acaf4e70 ("net: vxlan: when lower dev unregisters remove vxlan dev as well")
Reported-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Tested-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd27e0d4

Merge branch 'ixgbe' · 7dd40c19

由 David S. Miller 提交于 1月 17, 2014

Aaron Brown says:

====================
Intel Wired LAN Driver Updates

This series contains updates to ixgbe Ethan Zhao.  The first one replaces
the magic number "63" with a macro, IXGBE_MAX_VFS_DRV_LIMIT, the second
moves the call to set driver_max_VFS to before SRIOV is enabled.

The code of these patches match the v3 (1/2) and v2 (2/2) versions sent
to the e1000-devel and netdev mailing lists.  The intermediate versions
(v4, v5) are from sorting out style issues, mostly tabs to spaces and
split lines probably introduced via mailer.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7dd40c19

ixgbe: set driver_max_VFs should be done before enabling SRIOV · 31ac910e

由 ethan.zhao 提交于 1月 16, 2014

commit 43dc4e01 Limit number of reported VFs to device
 specific value It doesn't work and always returns -EBUSY because VFs are
 already enabled.

ixgbe_enable_sriov()
        pci_enable_sriov()
                sriov_enable()
                {
                ... ..
                iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
                pci_cfg_access_lock(dev);
                ... ...
                }

pci_sriov_set_totalvfs()
{
... ...
if (dev->sriov->ctrl & PCI_SRIOV_CTRL_VFE)
                return -EBUSY;
...
}

So should set driver_max_VFs with pci_sriov_set_totalvfs() before
enable VFs with ixgbe_enable_sriov().

V2: revised for net-next tree.
Signed-off-by: NEthan Zhao <ethan.kernel@gmail.com>
Tested-by: NPhil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31ac910e

ixgbe: define IXGBE_MAX_VFS_DRV_LIMIT macro and cleanup const 63 · dcc23e3a

由 ethan.zhao 提交于 1月 16, 2014

Because ixgbe driver limit the max number of VF
 functions could be enabled to 63, so define one macro IXGBE_MAX_VFS_DRV_LIMIT
 and cleanup the const 63 in code.

v3: revised for net-next tree.
Signed-off-by: NEthan Zhao <ethan.kernel@gmail.com>
Tested-by: NPhil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dcc23e3a

ipv4: fix a dst leak in tunnels · 6c7e7610

由 Eric Dumazet 提交于 1月 16, 2014

This patch :

1) Remove a dst leak if DST_NOCACHE was set on dst
   Fix this by holding a reference only if dst really cached.

2) Remove a lockdep warning in __tunnel_dst_set()
    This was reported by Cong Wang.

3) Remove usage of a spinlock where xchg() is enough

4) Remove some spurious inline keywords.
   Let compiler decide for us.

Fixes: 7d442fab ("ipv4: Cache dst in tunnels")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c7e7610

sh_eth: Add support for r7s72100 · db893473

由 Simon Horman 提交于 1月 17, 2014

The r7s72100 SoC includes a fast ethernet controller.
Signed-off-by: NSimon Horman <horms+renesas@verge.net.au>
Acked-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db893473

sh_eth: Use bool as return type of sh_eth_is_gether() · 504c8ca5

由 Simon Horman 提交于 1月 17, 2014

Return a boolean from sh_eth_is_gether() and refactor it as a one-liner.
Signed-off-by: NSimon Horman <horms+renesas@verge.net.au>
Acked-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

504c8ca5

ipv6: send Change Status Report after DAD is completed · 6a7cc418

由 Flavio Leitner 提交于 1月 16, 2014

The RFC 3810 defines two type of messages for multicast
listeners. The "Current State Report" message, as the name
implies, refreshes the *current* state to the querier.
Since the querier sends Query messages periodically, there
is no need to retransmit the report.

On the other hand, any change should be reported immediately
using "State Change Report" messages. Since it's an event
triggered by a change and that it can be affected by packet
loss, the rfc states it should be retransmitted [RobVar] times
to make sure routers will receive timely.

Currently, we are sending "Current State Reports" after
DAD is completed.  Before that, we send messages using
unspecified address (::) which should be silently discarded
by routers.

This patch changes to send "State Change Report" messages
after DAD is completed fixing the behavior to be RFC compliant
and also to pass TAHI IPv6 testsuite.
Signed-off-by: NFlavio Leitner <fbl@redhat.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a7cc418

qlcnic: remove unused code · c3bc40e2

由 stephen hemminger 提交于 1月 16, 2014

Remove function  qlcnic_enable_eswitch which was defined
but never used in current code.

Compile tested only.
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3bc40e2

qlcnic: make local functions static · 21041400

由 stephen hemminger 提交于 1月 16, 2014

Functions only used in one file should be static.
Found by running make namespacecheck

Compile tested only.
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

21041400

ipv6: tcp: fix flowlabel value in ACK messages send from TIME_WAIT · 1d13a96c

由 Florent Fourcot 提交于 1月 16, 2014

This patch is following the commit b903d324 (ipv6: tcp: fix TCLASS
value in ACK messages sent from TIME_WAIT).

For the same reason than tclass, we have to store the flow label in the
inet_timewait_sock to provide consistency of flow label on the last ACK.
Signed-off-by: NFlorent Fourcot <florent.fourcot@enst-bretagne.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d13a96c

Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next · d037c4d7

由 David S. Miller 提交于 1月 17, 2014

John W. Linville says:

====================
Please pull this batch of updates for the 3.14 stream!

For the mac80211 bits, Johannes says:

"This time I have uAPSD fixes since I was working on that, hwsim
improvements to make dynamic radios possible for the test suite, the
evidently long-overdue channel_change_time removal and a few other small
collected fix and improvements."

For the iwlwifi bits, Emmanuel says:

"Besides a few trivial patches, I have an important workaround for a HW
issue that has kept me busy for a long time. Along with it, a fix that
prevents an error from being printed.
Eyal fixes our behavior against SISO APs and Ilan fixes an issue with
multiple interface scenarios.
Eliad fixes an error path in our init flow.
We also have a few 'static analyzers' fix."

For the NFC bits, Samuel says:

"It includes:

* A new NFC driver for Marvell's 8897, and a few NCI fixes and
  improvements needed to support this chipset.

* An LLCP fix for how we were setting the default MIU on a p2p link. If
  there is no explicit MIU extension announced at connection time, we
  must use the default one and not the one announced at LLCP link
  establishement time.

* A pn544 EEPROM config update. Some of the currently EEPROM configured
  values are overwriting the firmware ones while other should not be set
  by the driver itself.

* Some NFC digital stack fixes and improvements. Asynchronous functions
  are better documented, RF technologies and CRC functions are set upon
  PSL_REQ reception, and a few minor bugs are fixed.

* Minor and miscelaneous pn533, mei_phy and port100 fixes."

For the ath bits, Kalle says:

"Janusz added Kconfig option for DFS. The DFS code was there already, but
after fixes to mac80211 we can now enable it.

Bartosz added a runtime firmware feature flag to disable P2P. Our 10.1
firmware branch doesn't support P2P and ath10k can now disable that. He
also added a limit for how many clients can connect to ath10k AP.

Michal fixed WEP shared authentication, in case someone still uses it.
And I added firmware debug log to help the firmware engineers."

Along with that is a small batch of ath9k updates and a few other bits
here and there.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d037c4d7

Merge branch 'master' of... · 7916a075

由 John W. Linville 提交于 1月 17, 2014

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem

7916a075

17 1月, 2014 13 次提交

Merge branch 'virtio_rx_merging' · cf84eb0b

由 David S. Miller 提交于 1月 16, 2014

Michael Dalton says:

====================
virtio-net: mergeable rx buffer size auto-tuning

The virtio-net device currently uses aligned MTU-sized mergeable receive
packet buffers. Network throughput for workloads with large average
packet size can be improved by posting larger receive packet buffers.
However, due to SKB truesize effects, posting large (e.g, PAGE_SIZE)
buffers reduces the throughput of workloads that do not benefit from GRO
and have no large inbound packets.

This patchset introduces virtio-net mergeable buffer size auto-tuning,
with buffer sizes ranging from aligned MTU-size to PAGE_SIZE. Packet
buffer size is chosen based on a per-receive queue EWMA of incoming
packet size.

To unify mergeable receive buffer memory allocation and improve
SKB frag coalescing, all mergeable buffer memory allocation is
migrated to per-receive queue page frag allocators.

The per-receive queue mergeable packet buffer size is exported via
sysfs, and the network device sysfs layer has been extended to add
support for device-specific per-receive queue sysfs attribute groups.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf84eb0b

virtio-net: initial rx sysfs support, export mergeable rx buffer size · fbf28d78

由 Michael Dalton 提交于 1月 16, 2014

Add initial support for per-rx queue sysfs attributes to virtio-net. If
mergeable packet buffers are enabled, adds a read-only mergeable packet
buffer size sysfs attribute for each RX queue.
Suggested-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael Dalton <mwdalton@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fbf28d78

lib: Ensure EWMA does not store wrong intermediate values · 03144b58

由 Michael Dalton 提交于 1月 16, 2014

To ensure ewma_read() without a lock returns a valid but possibly
out of date average, modify ewma_add() by using ACCESS_ONCE to prevent
intermediate wrong values from being written to avg->internal.
Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NMichael Dalton <mwdalton@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03144b58

net-sysfs: add support for device-specific rx queue sysfs attributes · a953be53

由 Michael Dalton 提交于 1月 16, 2014

Extend existing support for netdevice receive queue sysfs attributes to
permit a device-specific attribute group. Initial use case for this
support will be to allow the virtio-net device to export per-receive
queue mergeable receive buffer size.
Signed-off-by: NMichael Dalton <mwdalton@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a953be53

virtio-net: auto-tune mergeable rx buffer size for improved performance · ab7db917

由 Michael Dalton 提交于 1月 16, 2014

Commit 2613af0e ("virtio_net: migrate mergeable rx buffers to page frag
allocators") changed the mergeable receive buffer size from PAGE_SIZE to
MTU-size, introducing a single-stream regression for benchmarks with large
average packet size. There is no single optimal buffer size for all
workloads.  For workloads with packet size <= MTU bytes, MTU + virtio-net
header-sized buffers are preferred as larger buffers reduce the TCP window
due to SKB truesize. However, single-stream workloads with large average
packet sizes have higher throughput if larger (e.g., PAGE_SIZE) buffers
are used.

This commit auto-tunes the mergeable receiver buffer packet size by
choosing the packet buffer size based on an EWMA of the recent packet
sizes for the receive queue. Packet buffer sizes range from MTU_SIZE +
virtio-net header len to PAGE_SIZE. This improves throughput for
large packet workloads, as any workload with average packet size >=
PAGE_SIZE will use PAGE_SIZE buffers.

These optimizations interact positively with recent commit
ba275241 ("virtio-net: coalesce rx frags when possible during rx"),
which coalesces adjacent RX SKB fragments in virtio_net. The coalescing
optimizations benefit buffers of any size.

Benchmarks taken from an average of 5 netperf 30-second TCP_STREAM runs
between two QEMU VMs on a single physical machine. Each VM has two VCPUs
with all offloads & vhost enabled. All VMs and vhost threads run in a
single 4 CPU cgroup cpuset, using cgroups to ensure that other processes
in the system will not be scheduled on the benchmark CPUs. Trunk includes
SKB rx frag coalescing.

net-next w/ virtio_net before 2613af0e (PAGE_SIZE bufs): 14642.85Gb/s
net-next (MTU-size bufs):  13170.01Gb/s
net-next + auto-tune: 14555.94Gb/s

Jason Wang also reported a throughput increase on mlx4 from 22Gb/s
using MTU-sized buffers to about 26Gb/s using auto-tuning.
Signed-off-by: NMichael Dalton <mwdalton@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab7db917

virtio-net: use per-receive queue page frag alloc for mergeable bufs · fb51879d

由 Michael Dalton 提交于 1月 16, 2014

The virtio-net driver currently uses netdev_alloc_frag() for GFP_ATOMIC
mergeable rx buffer allocations. This commit migrates virtio-net to use
per-receive queue page frags for GFP_ATOMIC allocation. This change unifies
mergeable rx buffer memory allocation, which now will use skb_refill_frag()
for both atomic and GFP-WAIT buffer allocations.

To address fragmentation concerns, if after buffer allocation there
is too little space left in the page frag to allocate a subsequent
buffer, the remaining space is added to the current allocated buffer
so that the remaining space can be used to store packet data.
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael Dalton <mwdalton@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb51879d

net: allow > 0 order atomic page alloc in skb_page_frag_refill · 097b4f19

由 Michael Dalton 提交于 1月 16, 2014

skb_page_frag_refill currently permits only order-0 page allocs
unless GFP_WAIT is used. Change skb_page_frag_refill to attempt
higher-order page allocations whether or not GFP_WAIT is used. If
memory cannot be allocated, the allocator will fall back to
successively smaller page allocs (down to order-0 page allocs).

This change brings skb_page_frag_refill in line with the existing
page allocation strategy employed by netdev_alloc_frag, which attempts
higher-order page allocations whether or not GFP_WAIT is set, falling
back to successively lower-order page allocations on failure. Part
of migration of virtio-net to per-receive queue page frag allocators.
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NMichael Dalton <mwdalton@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

097b4f19

net_sched: fix error return code in fw_change_attrs() · 722e47d7

由 Wei Yongjun 提交于 1月 17, 2014

The error code was not set if change indev fail, so the error
condition wasn't reflected in the return value. Fix to return a
negative error code from this error handling case instead of 0.

Fixes: 2519a602 ('net_sched: optimize tcf_match_indev()')
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

722e47d7

Merge branch 'tipc' · 8b88a11e

由 David S. Miller 提交于 1月 16, 2014

Ying Xue says:

====================
tipc: align TIPC behaviours of waiting for events with other stacks

Comparing the current implementations of waiting for events in TIPC
socket layer with other stacks, TIPC's behaviour is very different
because wait_event_interruptible_timeout()/wait_event_interruptible()
are always used by TIPC to wait for events while relevant socket or
port variables are fed to them as their arguments. As socket lock has
to be released temporarily before the two routines of waiting for
events are called, their arguments associated with socket or port
structures are out of socket lock protection. This might cause
serious issues where the process of calling socket syscall such as
sendsmg(), connect(), accept(), and recvmsg(), cannot be waken up
at all even if proper event arrives or improperly be woken up
although the condition of waking up the process is not satisfied
in practice.

Therefore, aligning its behaviours with similar functions implemented
in other stacks, for instance, sk_stream_wait_connect() and
inet_csk_wait_for_connect() etc, can avoid above risks for us.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b88a11e

tipc: standardize recvmsg routine · 9bbb4ecc

由 Ying Xue 提交于 1月 17, 2014

Standardize the behaviour of waiting for events in TIPC recvmsg()
so that all variables of socket or port structures are protected
within socket lock, allowing the process of calling recvmsg() to
be woken up at appropriate time.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bbb4ecc

tipc: standardize sendmsg routine of connected socket · 391a6dd1

由 Ying Xue 提交于 1月 17, 2014

Standardize the behaviour of waiting for events in TIPC send_packet()
so that all variables of socket or port structures are protected within
socket lock, allowing the process of calling sendmsg() to be woken up
at appropriate time.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

391a6dd1

tipc: standardize sendmsg routine of connectionless socket · 3f40504f

由 Ying Xue 提交于 1月 17, 2014

Comparing the behaviour of how to wait for events in TIPC sendmsg()
with other stacks, the TIPC implementation might be perceived as
different, and sometimes even incorrect. For instance, sk_sleep()
and tport->congested variables associated with socket are exposed
without socket lock protection while wait_event_interruptible_timeout()
accesses them. So standardizing it with similar implementation
in other stacks can help us correct these errors which the process
of calling sendmsg() cannot be woken up event if an expected event
arrive at socket or improperly woken up although the wake condition
doesn't match.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f40504f

tipc: standardize accept routine · 6398e23c

由 Ying Xue 提交于 1月 17, 2014

Comparing the behaviour of how to wait for events in TIPC accept()
with other stacks, the TIPC implementation might be perceived as
different, and sometimes even incorrect. As sk_sleep() and
sk->sk_receive_queue variables associated with socket are not
protected by socket lock, the process of calling accept() may be
woken up improperly or sometimes cannot be woken up at all. After
standardizing it with inet_csk_wait_for_connect routine, we can
get benefits including: avoiding 'thundering herd' phenomenon,
adding a timeout mechanism for accept(), coping with a pending
signal, and having sk_sleep() and sk->sk_receive_queue being
always protected within socket lock scope and so on.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6398e23c