提交 · c02e0fd393792b5f4c0992c97f64b9006eff334c · openeuler / raspberrypi-kernel

09 1月, 2014 3 次提交

i40e: use kernel specific defines · c02e0fd3

由 Jesse Brandeburg 提交于 12月 18, 2013

Replace uses of I40E_LENGTH_OF_ADDRESS with ETH_ALEN.
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NKavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

c02e0fd3

i40e: Re-enable interrupt on ICR0 · 5e823066

由 Anjali Singhai Jain 提交于 12月 18, 2013

The hardware can occasionally give an interrupt on the misc
queue for which there is no driver work to do.  In that case
the driver was not re-enabling interrupts even though they
were auto masked by hardware.  This left interrupts disabled
on this queue.

Re-enable the interrupt whenever leaving this function.
Signed-off-by: NAnjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NKavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

5e823066

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 54b553e2

由 David S. Miller 提交于 1月 08, 2014

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains three Netfilter updates, they are:

* Fix wrong usage of skb_header_pointer in the DCCP protocol helper that
  has been there for quite some time. It was resulting in copying the dccp
  header to a pointer allocated in the stack. Fortunately, this pointer
  provides room for the dccp header is 4 bytes long, so no crashes have been
  reported so far. From Daniel Borkmann.

* Use format string to print in the invocation of nf_log_packet(), again
  in the DCCP helper. Also from Daniel Borkmann.

* Revert "netfilter: avoid get_random_bytes call" as prandom32 does not
  guarantee enough entropy when being calling this at boot time, that may
  happen when reloading the rule.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54b553e2

08 1月, 2014 30 次提交

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · 80077935

由 David S. Miller 提交于 1月 08, 2014

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates

This series contains updates to i40e only.

Anjali adds more functionality to debugfs to assist development and
testing of admin queue commands.

Greg makes sure broadcast promiscuous is disabled by default, otherwise
VLAN tagged packets out of the assigned VLAN domain are received.  Also
provides a fix when the 8021q driver is loaded, so that VLAN 0 tagged
packets are accepted so that upper layers can interpret the priority
bits. Then provides a fix to let the VF to request the PF to set its
already assigned MAC address without generating an error.  Greg also
adds helper functions to enable or disable internal switch loopback
when VFs are created or destroyed via the sysfs interface.

Shannon provides most of the changes, where he adds code to ensure
that the hardware waits to make sure that the firmware is ready as well
after reset.  Also updates the code to use the new features in the
firmware.  Provides a fix while in MFP mode where resources are
reduced, so use a smaller range of test registers than when in SFP mode.
Moves the PF ID initialization code to earlier in the driver
initialization function since a few operations need the information
before the first PF reset is called.  Shannon adds a check for MAC
type before reading anything from the registers to ensure we dealing
with the correct MAC type.  Then reworks Shadow RAM read word/buffer
functions as to not block whole NVM resources for SR read operations.

Mitch lastly provides a fix to correctly setup ARQ descriptors in
the cleanup code.

Catherine bumps the driver version due to all the recent changes.

v2:
 - Added blank lines after local variable declarations to patch 01
   as suggested by David Miller
 - Used the suggested memset() line in patch 15 as suggested by David
   Miller
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80077935

i40e: correctly setup ARQ descriptors · 90077773

由 Mitch Williams 提交于 12月 18, 2013

When cleaning descriptors, we must set up ALL fields, not just the DMA
address. The initial setup does this correctly, but not the cleanup
code, so the firmware would process the ring exactly once and then fail.

Change-ID: I2930b83c76194b3016a8ac0fa693f9a573995640
Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NKavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

90077773

i40e: remove redundant AQ enable · 071b8de9

由 Kamil Krawczyk 提交于 12月 18, 2013

The admin queue length register is updated in
config_a<sq|rq>_regs functions.  We should not update it again,
as we will trigger firmware to init the AQ again. In this case
firmware will lose the information about the AQ Rx tail position
and will see Rx queue as full (no free desc for FW to use).
Signed-off-by: NKamil Krawczyk <kamil.krawczyk@intel.com>
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NKavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

071b8de9

i40e: Enable/Disable PF switch LB on SR-IOV configure changes · c354229f

由 Greg Rose 提交于 12月 13, 2013

The PF VSI was never updated to enable or disable internal switch loopback
when VFs were created or destroyed via the sysfs interface.  Add some
helper functions to take care of that.
Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
Tested-by: NSibai Li <sibai.li@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

c354229f

i40e: whitespace paren and comment tweaks · 922680b9

由 Shannon Nelson 提交于 12月 18, 2013

Addresses a few code format issues that have crept in over time.

Change-Id: I1a62cbd16b29a218a933b0f7176abe748f9615e8
Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

922680b9

i40e: rework shadow ram read functions · a4bcfbb7

由 Shannon Nelson 提交于 12月 11, 2013

Rework Shadow RAM read word/buffer functions to not use AQ Request
Resource commands. Requesting resource is not needed for SR read
operations which are done through SRCTL register. Access to SR through
register is controlled through DONE bit within SRCTL. With this change
we do not block whole NVM resource for SR read operations.

Change-Id: I73e96cdea39a45ee7b5bdf038e527308de2d9efe
Signed-off-by: NKamil Krawczyk <kamil.krawczyk@intel.com>
Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
Tested-by: NKavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

a4bcfbb7

i40e: check MAC type before any REG access · af89d26c

由 Shannon Nelson 提交于 12月 11, 2013

We need to check if we are dealing with the correct MAC type before we
try to read anything from the registers.

Change-Id: I3989516999d06c3009e87d6a2eafc20af305c5c2
Signed-off-by: NKamil Krawczyk <kamil.krawczyk@intel.com>
Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
Tested-by: NKavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

af89d26c

i40e: move PF ID init from PF reset to SC init · 5f9116ac

由 Shannon Nelson 提交于 12月 11, 2013

Move PF ID initialization code from PF reset routine to earlier driver
initialization function.  There are a few operations which need the
PF ID before the first PF reset is called.

Change-Id: I7e971f7556b46a837149850ec05ce115c35db575
Signed-off-by: NKamil Krawczyk <kamil.krawczyk@intel.com>
Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
Tested-by: NKavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

5f9116ac

i40e: Reduce range of interrupt reg in reg test · af9810ea

由 Shannon Nelson 提交于 12月 11, 2013

Use a smaller range of test registers in MFP mode as there are fewer
resources than when in SFP mode.

Change-Id: I08424890c3f57b5dde5ee99e99724ce252e0875a
Signed-off-by: NKamil Krawczyk <kamil.krawczyk@intel.com>
Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
Tested-by: NKavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

af9810ea

i40e: update firmware api to 1.1 · 981b7545

由 Shannon Nelson 提交于 12月 11, 2013

The firmware's AdminQ interface has matured a little, so update the
code to use the new fields and values.

Change-Id: I8fcd7b443f268dcf9346bd6a9e940fe9c2958891
Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
Tested-by: NKavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

981b7545

i40e: Add code to wait for FW to complete in reset path · 42794bd8

由 Shannon Nelson 提交于 12月 11, 2013

The RSTAT comes back with DEVICE READY to indicate that the HW is ready,
but we still have to wait for the FW to indicate that the Core and Global
modules are back up again. This needs a read of the NVM_ULD register.

Change-Id: I88276165f9cd446d2f166fb4b8cff00521af4bec
Signed-off-by: NAnjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

42794bd8

i40e: Bump version · 7f61d1f7

由 Catherine Sullivan 提交于 11月 28, 2013

Version update to 0.3.25-k
Signed-off-by: NCatherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

7f61d1f7

i40e: Allow VF to set already assigned MAC address · 5017c2a8

由 Greg Rose 提交于 12月 07, 2013

The VF is allowed to request the PF to set its already assigned
MAC address without generating an error.

Change-Id: I8dfdf353396995dbbb26cafab4e42b451911da3d
Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: NShannon Nelson <shannon.nelson@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

5017c2a8

i40e: Stop accepting any VLAN tag on VLAN 0 filter set · 6bbac866

由 Greg Rose 提交于 11月 28, 2013

When the 8021q driver is loaded it sets VLAN 0 filters on all devices.
This does not mean that any VLAN tagged packet should be accepted.
Instead accept only VLAN 0 tagged packets so that upper layers can
interpret the priority bits.

Change-Id: I17274a540b613749612ffe23a3aef2b8ee6ff6a4
Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NSibai Li <sibai.li@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

6bbac866

i40e: Do not enable broadcast promiscuous by default · 1a10370a

由 Greg Rose 提交于 11月 28, 2013

Broadcast promiscuous should only be turned on when general
promiscuous mode is turned on, otherwise VLAN tagged packets out of
the assigned VLAN domain are received.

Add a broadcast MAC filter in order to continue to receive
broadcast traffic on VLANs, MAIN or VMDQ VSI.

Change-Id: I99d8e382a082ee51201228f1226af3b46452ac55
Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NSibai Li <sibai.li@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

1a10370a

i40e: Expose AQ debugfs hooks · f1143c4b

由 Anjali Singhai Jain 提交于 11月 28, 2013

Add more functionality to debugfs to assist development
and testing of AQ commands.

adds:
send aq_cmd
send indirect aq_cmd

Change-Id: I01710cddd33110a6c1e1aabf84cb6e93cda4c42a
Signed-off-by: NAnjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NKavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

f1143c4b

net: xfrm: xfrm_policy: silence compiler warning · da7c224b

由 Ying Xue 提交于 1月 08, 2014

Fix below compiler warning:

net/xfrm/xfrm_policy.c:1644:12: warning: ‘xfrm_dst_alloc_copy’ defined but not used [-Wunused-function]
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da7c224b

Merge branch 'tipc' · 8752b5ca

由 David S. Miller 提交于 1月 07, 2014

Jon Maloy says:

====================
tipc: link setup and failover improvements

This series consists of four unrelated commits with different purposes.

- Commit #1 is purely cosmetic and pedagogic, hopefully making the
  failover/tunneling logics slightly easier to understand.
- Commit #2 fixes a bug that has always been in the code, but was not
  discovered until very recently.
- Commit #3 fixes a non-fatal race issue in the neighbour discovery
  code.
- Commit #4 removes an unnecessary indirection step during link
  startup.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8752b5ca

tipc: make link start event synchronous · 581465fa

由 Jon Paul Maloy 提交于 1月 07, 2014

When a link is created we delay the start event by launching it
to be executed later in a tasklet. As we hold all the
necessary locks at the moment of creation, and there is no risk
of deadlock or contention, this delay serves no purpose in the
current code.

We remove this obsolete indirection step, and the associated function
link_start(). At the same time, we rename the function tipc_link_stop()
to the more appropriate tipc_link_purge_queues().
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

581465fa

tipc: introduce new spinlock to protect struct link_req · f9a2c80b

由 Ying Xue 提交于 1月 07, 2014

Currently, only 'bearer_lock' is used to protect struct link_req in
the function disc_timeout(). This is unsafe, since the member fields
'num_nodes' and 'timer_intv' might be accessed by below three different
threads simultaneously, none of them grabbing bearer_lock in the
critical region:

link_activate()
  tipc_bearer_add_dest()
    tipc_disc_add_dest()
      req->num_nodes++;

tipc_link_reset()
  tipc_bearer_remove_dest()
    tipc_disc_remove_dest()
      req->num_nodes--
      disc_update()
        read req->num_nodes
	write req->timer_intv

disc_timeout()
  read req->num_nodes
  read/write req->timer_intv

Without lock protection, the only symptom of a race is that discovery
messages occasionally may not be sent out. This is not fatal, since such
messages are best-effort anyway. On the other hand, since discovery
messages are not time critical, adding a protecting lock brings no
serious overhead either. So we add a new, dedicated spinlock in
order to guarantee absolute data consistency in link_req objects.
This also helps reduce the overall role of the bearer_lock, which
we want to remove completely in a later commit series.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f9a2c80b

tipc: remove 'has_redundant_link' flag from STATE link protocol messages · b9d4c339

由 Jon Paul Maloy 提交于 1月 07, 2014

The flag 'has_redundant_link' is defined only in RESET and ACTIVATE
protocol messages. Due to an ambiguity in the protocol specification it
is currently also transferred in STATE messages. Its value is used to
initialize a link state variable, 'permit_changeover', which is used
to inhibit futile link failover attempts when it is known that the
peer node has no working links at the moment, although the local node
may still think it has one.

The fact that 'has_redundant_link' incorrectly is read from STATE
messages has the effect that 'permit_changeover' sometimes gets a wrong
value, and permanently blocks any links from being re-established. Such
failures can only occur in in dual-link systems, and are extremely rare.
This bug seems to have always been present in the code.

Furthermore, since commit b4b56102
("tipc: Ensure both nodes recognize loss of contact between them"),
the 'permit_changeover' field serves no purpose any more. The task of
enforcing 'lost contact' cycles at both peer endpoints is now taken
by a new mechanism, using the flags WAIT_NODE_DOWN and WAIT_PEER_DOWN
in struct tipc_node to abort unnecessary failover attempts.

We therefore remove the 'has_redundant_link' flag from STATE messages,
as well as the now redundant 'permit_changeover' variable.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9d4c339

tipc: rename functions related to link failover and improve comments · 170b3927

由 Jon Paul Maloy 提交于 1月 07, 2014

The functionality related to link addition and failover is unnecessarily
hard to understand and maintain. We try to improve this by renaming
some of the functions, at the same time adding or improving the
explanatory comments around them. Names such as "tipc_rcv()" etc. also
align better with what is used in other networking components.

The changes in this commit are purely cosmetic, no functional changes
are made.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

170b3927

net: skbuff: const-ify casts in skb_queue_* functions · fd44b93c

由 Daniel Borkmann 提交于 1月 07, 2014

We should const-ify comparisons on skb_queue_* inline helper
functions as their parameters are const as well, so lets not
drop that.
Suggested-by: NBrad Spengler <spender@grsecurity.net>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd44b93c

net: xfrm: xfrm_policy: fix inline not at beginning of declaration · be7928d2

由 Daniel Borkmann 提交于 1月 07, 2014

Fix three warnings related to:

net/xfrm/xfrm_policy.c:1644:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
net/xfrm/xfrm_policy.c:1656:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
net/xfrm/xfrm_policy.c:1668:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]

Just removing the inline keyword is sufficient as the compiler will
decide on its own about inlining or not.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be7928d2

mlx4_en: Select PTP_1588_CLOCK · 74b9c3ea

由 Shawn Bohrer 提交于 1月 07, 2014

Now that mlx4_en includes a PHC driver it must select PTP_1588_CLOCK.

   drivers/built-in.o: In function `mlx4_en_get_ts_info':
>> en_ethtool.c:(.text+0x391a11): undefined reference to `ptp_clock_index'
   drivers/built-in.o: In function `mlx4_en_remove_timestamp':
>> (.text+0x397913): undefined reference to `ptp_clock_unregister'
   drivers/built-in.o: In function `mlx4_en_init_timestamp':
>> (.text+0x397b20): undefined reference to `ptp_clock_register'

Fixes: ad7d4eae ("mlx4_en: Add PTP hardware clock")
Signed-off-by: NShawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74b9c3ea

net-gre-gro: Add GRE support to the GRO stack · bf5a755f

由 Jerry Chu 提交于 1月 07, 2014

This patch built on top of Commit 299603e8
("net-gro: Prepare GRO stack for the upcoming tunneling support") to add
the support of the standard GRE (RFC1701/RFC2784/RFC2890) to the GRO
stack. It also serves as an example for supporting other encapsulation
protocols in the GRO stack in the future.

The patch supports version 0 and all the flags (key, csum, seq#) but
will flush any pkt with the S (seq#) flag. This is because the S flag
is not support by GSO, and a GRO pkt may end up in the forwarding path,
thus requiring GSO support to break it up correctly.

Currently the "packet_offload" structure only contains L3 (ETH_P_IP/
ETH_P_IPV6) GRO offload support so the encapped pkts are limited to
IP pkts (i.e., w/o L2 hdr). But support for other protocol type can
be easily added, so is the support for GRE variations like NVGRE.

The patch also support csum offload. Specifically if the csum flag is on
and the h/w is capable of checksumming the payload (CHECKSUM_COMPLETE),
the code will take advantage of the csum computed by the h/w when
validating the GRE csum.

Note that commit 60769a5d "ipv4: gre:
add GRO capability" already introduces GRO capability to IPv4 GRE
tunnels, using the gro_cells infrastructure. But GRO is done after
GRE hdr has been removed (i.e., decapped). The following patch applies
GRO when pkts first come in (before hitting the GRE tunnel code). There
is some performance advantage for applying GRO as early as possible.
Also this approach is transparent to other subsystem like Open vSwitch
where GRE decap is handled outside of the IP stack hence making it
harder for the gro_cells stuff to apply. On the other hand, some NICs
are still not capable of hashing on the inner hdr of a GRE pkt (RSS).
In that case the GRO processing of pkts from the same remote host will
all happen on the same CPU and the performance may be suboptimal.

I'm including some rough preliminary performance numbers below. Note
that the performance will be highly dependent on traffic load, mix as
usual. Moreover it also depends on NIC offload features hence the
following is by no means a comprehesive study. Local testing and tuning
will be needed to decide the best setting.

All tests spawned 50 copies of netperf TCP_STREAM and ran for 30 secs.
(super_netperf 50 -H 192.168.1.18 -l 30)

An IP GRE tunnel with only the key flag on (e.g., ip tunnel add gre1
mode gre local 10.246.17.18 remote 10.246.17.17 ttl 255 key 123)
is configured.

The GRO support for pkts AFTER decap are controlled through the device
feature of the GRE device (e.g., ethtool -K gre1 gro on/off).

1.1 ethtool -K gre1 gro off; ethtool -K eth0 gro off
thruput: 9.16Gbps
CPU utilization: 19%

1.2 ethtool -K gre1 gro on; ethtool -K eth0 gro off
thruput: 5.9Gbps
CPU utilization: 15%

1.3 ethtool -K gre1 gro off; ethtool -K eth0 gro on
thruput: 9.26Gbps
CPU utilization: 12-13%

1.4 ethtool -K gre1 gro on; ethtool -K eth0 gro on
thruput: 9.26Gbps
CPU utilization: 10%

The following tests were performed on a different NIC that is capable of
csum offload. I.e., the h/w is capable of computing IP payload csum
(CHECKSUM_COMPLETE).

2.1 ethtool -K gre1 gro on (hence will use gro_cells)

2.1.1 ethtool -K eth0 gro off; csum offload disabled
thruput: 8.53Gbps
CPU utilization: 9%

2.1.2 ethtool -K eth0 gro off; csum offload enabled
thruput: 8.97Gbps
CPU utilization: 7-8%

2.1.3 ethtool -K eth0 gro on; csum offload disabled
thruput: 8.83Gbps
CPU utilization: 5-6%

2.1.4 ethtool -K eth0 gro on; csum offload enabled
thruput: 8.98Gbps
CPU utilization: 5%

2.2 ethtool -K gre1 gro off

2.2.1 ethtool -K eth0 gro off; csum offload disabled
thruput: 5.93Gbps
CPU utilization: 9%

2.2.2 ethtool -K eth0 gro off; csum offload enabled
thruput: 5.62Gbps
CPU utilization: 8%

2.2.3 ethtool -K eth0 gro on; csum offload disabled
thruput: 7.69Gbps
CPU utilization: 8%

2.2.4 ethtool -K eth0 gro on; csum offload enabled
thruput: 8.96Gbps
CPU utilization: 5-6%
Signed-off-by: NH.K. Jerry Chu <hkchu@google.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf5a755f

net: Do not enable tx-nocache-copy by default · cdb3f4a3

由 Benjamin Poirier 提交于 1月 07, 2014

There are many cases where this feature does not improve performance or even
reduces it.

For example, here are the results from tests that I've run using 3.12.6 on one
Intel Xeon W3565 and one i7 920 connected by ixgbe adapters. The results are
from the Xeon, but they're similar on the i7. All numbers report the
mean±stddev over 10 runs of 10s.

1) latency tests similar to what is described in "c6e1a0d1 net: Allow no-cache
copy from user on transmit"
There is no statistically significant difference between tx-nocache-copy
on/off.
nic irqs spread out (one queue per cpu)

200x netperf -r 1400,1
tx-nocache-copy off
        692000±1000 tps
        50/90/95/99% latency (us): 275±2/643.8±0.4/799±1/2474.4±0.3
tx-nocache-copy on
        693000±1000 tps
        50/90/95/99% latency (us): 274±1/644.1±0.7/800±2/2474.5±0.7

200x netperf -r 14000,14000
tx-nocache-copy off
        86450±80 tps
        50/90/95/99% latency (us): 334.37±0.02/838±1/2100±20/3990±40
tx-nocache-copy on
        86110±60 tps
        50/90/95/99% latency (us): 334.28±0.01/837±2/2110±20/3990±20

2) single stream throughput tests
tx-nocache-copy leads to higher service demand

                        throughput  cpu0        cpu1        demand
                        (Gb/s)      (Gcycle)    (Gcycle)    (cycle/B)

nic irqs and netperf on cpu0 (1x netperf -T0,0 -t omni -- -d send)

tx-nocache-copy off     9402±5      9.4±0.2                 0.80±0.01
tx-nocache-copy on      9403±3      9.85±0.04               0.838±0.004

nic irqs on cpu0, netperf on cpu1 (1x netperf -T1,1 -t omni -- -d send)

tx-nocache-copy off     9401±5      5.83±0.03   5.0±0.1     0.923±0.007
tx-nocache-copy on      9404±2      5.74±0.03   5.523±0.009 0.958±0.002

As a second example, here are some results from Eric Dumazet with latest
net-next.
tx-nocache-copy also leads to higher service demand

(cpu is Intel(R) Xeon(R) CPU X5660  @ 2.80GHz)

lpq83:~# ./ethtool -K eth0 tx-nocache-copy on
lpq83:~# perf stat ./netperf -H lpq84 -c
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB

 87380  16384  16384    10.00      9407.44   2.50     -1.00    0.522   -1.000

 Performance counter stats for './netperf -H lpq84 -c':

       4282.648396 task-clock                #    0.423 CPUs utilized
             9,348 context-switches          #    0.002 M/sec
                88 CPU-migrations            #    0.021 K/sec
               355 page-faults               #    0.083 K/sec
    11,812,797,651 cycles                    #    2.758 GHz                     [82.79%]
     9,020,522,817 stalled-cycles-frontend   #   76.36% frontend cycles idle    [82.54%]
     4,579,889,681 stalled-cycles-backend    #   38.77% backend  cycles idle    [67.33%]
     6,053,172,792 instructions              #    0.51  insns per cycle
                                             #    1.49  stalled cycles per insn [83.64%]
       597,275,583 branches                  #  139.464 M/sec                   [83.70%]
         8,960,541 branch-misses             #    1.50% of all branches         [83.65%]

      10.128990264 seconds time elapsed

lpq83:~# ./ethtool -K eth0 tx-nocache-copy off
lpq83:~# perf stat ./netperf -H lpq84 -c
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB

 87380  16384  16384    10.00      9412.45   2.15     -1.00    0.449   -1.000

 Performance counter stats for './netperf -H lpq84 -c':

       2847.375441 task-clock                #    0.281 CPUs utilized
            11,632 context-switches          #    0.004 M/sec
                49 CPU-migrations            #    0.017 K/sec
               354 page-faults               #    0.124 K/sec
     7,646,889,749 cycles                    #    2.686 GHz                     [83.34%]
     6,115,050,032 stalled-cycles-frontend   #   79.97% frontend cycles idle    [83.31%]
     1,726,460,071 stalled-cycles-backend    #   22.58% backend  cycles idle    [66.55%]
     2,079,702,453 instructions              #    0.27  insns per cycle
                                             #    2.94  stalled cycles per insn [83.22%]
       363,773,213 branches                  #  127.757 M/sec                   [83.29%]
         4,242,732 branch-misses             #    1.17% of all branches         [83.51%]

      10.128449949 seconds time elapsed

CC: Tom Herbert <therbert@google.com>
Signed-off-by: NBenjamin Poirier <bpoirier@suse.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cdb3f4a3

ipv4: loopback device: ignore value changes after device is upped · dfd1582d

由 Jiri Pirko 提交于 1月 07, 2014

When lo is brought up, new ifa is created. Then, devconf and neigh values
bitfield should be set so later changes of default values would not
affect lo values.

Note that the same behaviour is in ipv6. Also note that this is likely
not an issue in many distros (for example Fedora 19) because userspace
sets address to lo manually before bringing it up.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dfd1582d

IPv6: add the option to use anycast addresses as source addresses in echo reply · 509aba3b

由 FX Le Bail 提交于 1月 07, 2014

This change allows to follow a recommandation of RFC4942.

- Add "anycast_src_echo_reply" sysctl to control the use of anycast addresses
  as source addresses for ICMPv6 echo reply. This sysctl is false by default
  to preserve existing behavior.
- Add inline check ipv6_anycast_destination().
- Use them in icmpv6_echo_reply().

Reference:
RFC4942 - IPv6 Transition/Coexistence Security Considerations
   (http://tools.ietf.org/html/rfc4942#section-2.1.6)

2.1.6. Anycast Traffic Identification and Security

   [...]
   To avoid exposing knowledge about the internal structure of the
   network, it is recommended that anycast servers now take advantage of
   the ability to return responses with the anycast address as the
   source address if possible.
Signed-off-by: NFrancois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

509aba3b

net/mlx4_en: fix error return code in mlx4_en_get_qp() · 9ba75fb0

由 Wei Yongjun 提交于 1月 07, 2014

Fix to return a negative error code from the error handling
case instead of 0.

Fixes: 837052d0 ('net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunneling')
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ba75fb0

07 1月, 2014 7 次提交

r8152: correct some messages · 4a8deae2

由 Hayes Wang 提交于 1月 07, 2014

 - Replace pr_warn_ratelimited() with net_ratelimit() and netdev_warn().
 - Adjust the algnment of some messages.
 - Remove the peroid.
 - Fix some messages don't have terminating newline.
Signed-off-by: NHayes Wang <hayeswang@realtek.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a8deae2

bna: Fix build due to missing use of dma_unmap_len_set() · ecca6a96

由 David S. Miller 提交于 1月 06, 2014

> as reported for linux-next of Dec.20, 2013
> when CONFIG_NEED_DMA_MAP_STATE is not enabled:
>
> drivers/net/ethernet/brocade/bna/bnad.c: In function 'bnad_start_xmit':
> drivers/net/ethernet/brocade/bna/bnad.c:3074:26: error: 'struct bnad_tx_vector' has no member named 'dma_len'
Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ecca6a96

gre_offload: statically build GRE offloading support · 438e38fa

由 Eric Dumazet 提交于 1月 06, 2014

GRO/GSO layers can be enabled on a node, even if said
node is only forwarding packets.

This patch permits GSO (and upcoming GRO) support for GRE
encapsulated packets, even if the host has no GRE tunnel setup.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

438e38fa

bgmac: fix typos · 0e595934

由 Hauke Mehrtens 提交于 1月 06, 2014

This fixes some typos found by Sergei.
Reported-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: NHauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e595934

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch · 39b6b299

由 David S. Miller 提交于 1月 06, 2014

Jesse Gross says:

====================
[GIT net-next] Open vSwitch

Open vSwitch changes for net-next/3.14. Highlights are:
 * Performance improvements in the mechanism to get packets to userspace
   using memory mapped netlink and skb zero copy where appropriate.
 * Per-cpu flow stats in situations where flows are likely to be shared
   across CPUs. Standard flow stats are used in other situations to save
   memory and allocation time.
 * A handful of code cleanups and rationalization.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

39b6b299

ovs: make functions local · 443cd88c

由 Stephen Hemminger 提交于 12月 17, 2013

Several functions and datastructures could be local
Found with 'make namespacecheck'
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NJesse Gross <jesse@nicira.com>

443cd88c

openvswitch: Compute checksum in skb_gso_segment() if needed · 09c5e605

由 Thomas Graf 提交于 12月 13, 2013

The copy & csum optimization is no longer present with zerocopy
enabled. Compute the checksum in skb_gso_segment() directly by
dropping the HW CSUM capability from the features passed in.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NJesse Gross <jesse@nicira.com>

09c5e605