提交 · 5a2138812604c32b7617b3b2e53e336617121d3b · openeuler / raspberrypi-kernel

19 11月, 2016 36 次提交

af_packet: Use virtio_net_hdr_from_skb() directly. · 5a213881

由 Jarno Rajahalme 提交于 11月 18, 2016

Remove static function __packet_rcv_vnet(), which only called
virtio_net_hdr_from_skb() and BUG()ged out if an error code was
returned.  Instead, call virtio_net_hdr_from_skb() from the former
call sites of __packet_rcv_vnet() and actually use the error handling
code that is already there.
Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a213881

af_packet: Use virtio_net_hdr_to_skb(). · db60eb5f

由 Jarno Rajahalme 提交于 11月 18, 2016

Use the common virtio_net_hdr_to_skb() instead of open coding it.
Other call sites were changed by commit fd2a0437, but this one was
missed, maybe because it is split in two parts of the source code.

Interim comparisons of 'vnet_hdr->gso_type' still work as both the
vnet_hdr and skb notion of gso_type is zero when there is no gso.

Fixes: fd2a0437 ("virtio_net: introduce virtio_net_hdr_{from,to}_skb")
Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db60eb5f

virtio_net: Do not clear memory for struct virtio_net_hdr twice. · 9403cd7c

由 Jarno Rajahalme 提交于 11月 18, 2016

virtio_net_hdr_from_skb() clears the memory for the header, so there
is no point for the callers to do the same.
Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9403cd7c

virtio_net.h: Fix comment. · d66016a7

由 Jarno Rajahalme 提交于 11月 18, 2016

Fix incorrent comment after the final #endif.
Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d66016a7

virtio_net: Simplify call sites for virtio_net_hdr_{from, to}_skb(). · 3e9e40e7

由 Jarno Rajahalme 提交于 11月 18, 2016

No point storing the return value of virtio_net_hdr_to_skb() or
virtio_net_hdr_from_skb() to a variable when the value is used only
once as a boolean in an immediately following if statement.
Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3e9e40e7

cxgb4: Allocate Tx queues dynamically · ab677ff4

由 Hariprasad Shenai 提交于 11月 18, 2016

Allocate resources dynamically for Upper layer driver's (ULD) like
cxgbit, iw_cxgb4, cxgb4i and chcr. The resources allocated include Tx
queues which are allocated when ULD register with cxgb4 driver and freed
while un-registering. The Tx queues which are shared by ULD shall be
allocated by first registering driver and un-allocated by last
unregistering driver.
Signed-off-by: NAtul Gupta <atul.gupta@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab677ff4

liquidio CN23XX: bitwise vs logical AND typo · c816061d

由 Dan Carpenter 提交于 11月 18, 2016

We obviously intended a bitwise AND here, not a logical one.

Fixes: 8c978d05 ("liquidio CN23XX: Mailbox support")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c816061d

lan78xx: relocate mdix setting to phy driver · f6e3ef3e

由 Woojung Huh 提交于 11月 17, 2016

Relocate mdix code to phy driver to be called at config_init().
Signed-off-by: NWoojung Huh <woojung.huh@microchip.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6e3ef3e

Merge branch 'net-marvell-freescale-compile-test' · 82e527df

由 David S. Miller 提交于 11月 18, 2016

Florian Fainelli says:

====================
net: Enable COMPILE_TEST for Marvell & Freescale drivers

This patch series allows building the Freescale and Marvell Ethernet network
drivers with COMPILE_TEST.

Changes in v4:

- add proper HAS_DMA to fix build errors on m32r
- provide an inline stub for mvebu_mbus_get_dram_win_info
- added an additional patch to fix build errors with mv88e6xxx on m32r

Changes in v3:

- reorder patches to avoid introducing a build warning between commits

Changes in v2:

- rename register define clash when building for i386 (spotted by LKP)
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82e527df

net: dsa: mv88e6xxx: Select IRQ_DOMAIN · 0717b876

由 Florian Fainelli 提交于 11月 17, 2016

Some architectures may not define IRQ_DOMAIN (like m32r), fixes
undefined references to IRQ_DOMAIN functions.

Fixes: dc30c35b ("net: dsa: mv88e6xxx: Implement interrupt support.")
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0717b876

net: marvell: Allow drivers to be built with COMPILE_TEST · a0627f77

由 Florian Fainelli 提交于 11月 17, 2016

All Marvell Ethernet drivers actually build fine with COMPILE_TEST with
a few warnings. We need to add a few HAS_DMA dependencies to fix linking
failures on problematic architectures like m32r.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0627f77

bus: mvebu-bus: Provide inline stub for mvebu_mbus_get_dram_win_info · 603ab573

由 Florian Fainelli 提交于 11月 17, 2016

In preparation for allowing CONFIG_MVNETA_BM to build with COMPILE_TEST,
provide an inline stub for mvebu_mbus_get_dram_win_info().
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

603ab573

net: fsl: Allow most drivers to be built with COMPILE_TEST · 0827be21

由 Florian Fainelli 提交于 11月 17, 2016

There are only a handful of Freescale Ethernet drivers that don't
actually build with COMPILE_TEST:

* FEC, for which we would need to define a default register layout if no
  supported architecture is defined

* UCC_GETH which depends on PowerPC cpm.h header (which could be moved
  to a generic location)

* GIANFAR needs to depend on HAS_DMA to fix linking failures on some
  architectures (like m32r)

We need to fix an unmet dependency to get there though:
warning: (FSL_XGMAC_MDIO) selects OF_MDIO which has unmet direct
dependencies (OF && PHYLIB)

which would result in CONFIG_OF_MDIO=[ym] without CONFIG_OF to be set.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0827be21

net: gianfar_ptp: Rename FS bit to FIPERST · 00a19e55

由 Florian Fainelli 提交于 11月 17, 2016

FS is a global symbol used by the x86 32-bit architecture, fixes builds
re-definitions:

>> drivers/net/ethernet/freescale/gianfar_ptp.c:75:0: warning: "FS"
>> redefined
    #define FS                    (1<<28) /* FIPER start indication */

   In file included from arch/x86/include/uapi/asm/ptrace.h:5:0,
                    from arch/x86/include/asm/ptrace.h:6,
                    from arch/x86/include/asm/math_emu.h:4,
                    from arch/x86/include/asm/processor.h:11,
                    from include/linux/mutex.h:19,
                    from include/linux/kernfs.h:13,
                    from include/linux/sysfs.h:15,
                    from include/linux/kobject.h:21,
                    from include/linux/device.h:17,
                    from
drivers/net/ethernet/freescale/gianfar_ptp.c:23:
   arch/x86/include/uapi/asm/ptrace-abi.h:15:0: note: this is the
location of the previous definition
    #define FS 9
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00a19e55

amd-xgbe: Update connection validation for backplane mode · 5a4e4c8f

由 Lendacky, Thomas 提交于 11月 17, 2016

Update the connection type enumeration for backplane mode and return
an error when there is a mismatch between the mode and the connection
type.
Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a4e4c8f

Merge branch 'ethtool-phy-downshift' · d3c19c0a

由 David S. Miller 提交于 11月 18, 2016

Allan W. Nielsen says:

====================
Adding PHY-Tunables and downshift support

(This is a re-post of the v3 patch set with a new cover letter - I was not
aware that the cover letters was used a commit comments in merge commits).

This series add support for PHY tunables, and uses this facility to
configure downshifting. The downshifting mechanism is implemented for MSCC
phys.

This series tries to address the comments provided back in mid October when
this feature was posted along with fast-link-failure. Fast-link-failure has
been separated out, but we would like to pick continue on that if/when we
agree on how the phy-tunables and downshifting should be done.

The proposed generic interface is similar to
ETHTOOL_GTUNABLE/ETHTOOL_STUNABLE, it uses the same type
(ethtool_tunable/tunable_type_id) but a new enum (phy_tunable_id) is added
to reflect the PHY tunable.

The implementation just call the newly added function pointers in
get_tunable/set_tunable phy_device structure.

To configure downshifting, the ethtool_tunable structure is used. 'id' must
be set to 'ETHTOOL_PHY_DOWNSHIFT', 'type_id' must be set to
'ETHTOOL_TUNABLE_U8' and 'data' value configure the amount of downshift
re-tries.

If configured to DOWNSHIFT_DEV_DISABLE, then downshift is disabled If
configured to DOWNSHIFT_DEV_DEFAULT_COUNT, then it is up to the device to
choose a device-specific re-try count.

Tested on Beaglebone Black with VSC 8531 PHY.

Change set:
v0:

- Link Speed downshift and Fast Link failure-2 features coded by using
  Device tree.
v1:
- Split the Downshift and FLF2 features in different set of patches.
- Removed DT access and implemented IOCTL access suggested by Andrew.
- Added function pointers in get_tunable/set_tunable phy_device structure
v2:
- Added trace message with a hist is printed when downshifting clould not
  be eanbled with the requested count
- (ethtool) Syntax is changed from "--set-phy-tunable downshift on|off|%d"
  to "--set-phy-tunable [downshift on|off [count N]]" - as requested by
  Andrew.
v3:
- Fixed Spelling in "net: phy: Add downshift get/set support in Microsemi
  PHYs driver"
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3c19c0a

net: phy: Add downshift get/set support in Microsemi PHYs driver · 310d9ad5

由 Raju Lakkaraju 提交于 11月 17, 2016

Implements the phy tunable function pointers and implement downshift
functionality for MSCC PHYs.
Signed-off-by: NRaju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NAllan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

310d9ad5

ethtool: Core impl for ETHTOOL_PHY_DOWNSHIFT tunable · 65feddd5

由 Raju Lakkaraju 提交于 11月 17, 2016

Adding validation support for the ETHTOOL_PHY_DOWNSHIFT. Functional
implementation needs to be done in the individual PHY drivers.
Signed-off-by: NRaju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NAllan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65feddd5

ethtool: (uapi) Add ETHTOOL_PHY_DOWNSHIFT to PHY tunables · 607c7029

由 Raju Lakkaraju 提交于 11月 17, 2016

For operation in cabling environments that are incompatible with
1000BASE-T, PHY device may provide an automatic link speed downshift
operation. When enabled, the device automatically changes its 1000BASE-T
auto-negotiation to the next slower speed after a configured number of
failed attempts at 1000BASE-T.  This feature is useful in setting up in
networks using older cable installations that include only pairs A and B,
and not pairs C and D.
Signed-off-by: NRaju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Signed-off-by: NAllan W. Nielsen <allan.nielsen@microsemi.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

607c7029

ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE · 968ad9da

由 Raju Lakkaraju 提交于 11月 17, 2016

Adding get_tunable/set_tunable function pointer to the phy_driver
structure, and uses these function pointers to implement the
ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE ioctls.
Signed-off-by: NRaju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NAllan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

968ad9da

ethtool: (uapi) Add ETHTOOL_PHY_GTUNABLE and ETHTOOL_PHY_STUNABLE · 0d27f4e4

由 Raju Lakkaraju 提交于 11月 17, 2016

Defines a generic API to get/set phy tunables. The API is using the
existing ethtool_tunable/tunable_type_id types which is already being used
for mac level tunables.
Signed-off-by: NRaju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NAllan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d27f4e4

Merge branch 'mlx5-next' · 511d5d5b

由 David S. Miller 提交于 11月 18, 2016

Saeed Mahameed says:

====================
Mellanox 100G mlx5 update 2016-11-15

This series contains four humble mlx5 features.

From Gal,
 - Add the support for PCIe statistics and expose them in ethtool

From Huy,
 - Add the support for port module events reporting and statistics
 - Add the support for driver version setting into FW (for display purposes only)

From Mohamad,
 - Extended the command interface cache flexibility

This series was generated against commit
6a02f5eb ("Merge branch 'mlxsw-i2c")

V2:
 - Changed plain "unsigned" to "unsigned int"
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

511d5d5b

net/mlx5e: Expose PCIe statistics to ethtool · 9c726239

由 Gal Pressman 提交于 11月 17, 2016

This patch exposes two groups of PCIe counters:
- Performance counters.
- Timers and states counters.
Queried with ethtool -S <devname>.
Signed-off-by: NGal Pressman <galp@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c726239

net/mlx5: Add MPCNT register infrastructure · 7f503169

由 Gal Pressman 提交于 11月 17, 2016

Add the needed infrastructure for future use of MPCNT register.
Signed-off-by: NGal Pressman <galp@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f503169

net/mlx5: Set driver version into firmware · 012e50e1

由 Huy Nguyen 提交于 11月 17, 2016

If driver_version capability bit is enabled, set driver version
to firmware after the init HCA command, for display purposes.

Example of driver version: "Linux,mlx5_core,3.0-1"
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

012e50e1

net/mlx5: Set driver version infrastructure · 0dbc6fe0

由 Saeed Mahameed 提交于 11月 17, 2016

Add driver_version capability bit is enabled, and set driver
version command in mlx5_ifc firmware header.  The only purpose
of this command is to store a driver version/OS string in FW
to be reported and displayed in various management systems,
such as IPMI/BMC.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0dbc6fe0

net/mlx5e: Add port module event counters to ethtool stats · bedb7c90

由 Huy Nguyen 提交于 11月 17, 2016

Add port module event counters to ethtool -S command
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bedb7c90

net/mlx5: Add handling for port module event · d4eb4cd7

由 Huy Nguyen 提交于 11月 17, 2016

For each asynchronous port module event:
  1. print with ratelimit to the dmesg log
  2. increment the corresponding event counter
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4eb4cd7

net/mlx5: Port module event hardware structures · 4ce3bf2f

由 Huy Nguyen 提交于 11月 17, 2016

Add hardware structures and constants definitions needed for module
events support.
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ce3bf2f

net/mlx5: Make the command interface cache more flexible · 0ac3ea70

由 Mohamad Haj Yahia 提交于 11月 17, 2016

Add more cache command size sets and more entries for each set based on
the current commands set different sizes and commands frequency.

Fixes: e126ba97 ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: NMohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ac3ea70

Merge branch 'sfc-tso-v2' · 7a8bca04

由 David S. Miller 提交于 11月 18, 2016

Edward Cree says:

====================
sfc: Firmware-Assisted TSO version 2

The firmware on 8000 series SFC NICs supports a new TSO API ("FATSOv2"), and
 7000 series NICs will also support this in an imminent release.  This series
 adds driver support for this TSO implementation.
The series also removes SWTSO, as it's now equivalent to GSO.  This does not
 actually remove very much code, because SWTSO was grotesquely intertwingled
 with FATSOv1, which will also be removed once 7000 series supports FATSOv2.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a8bca04

sfc: remove Software TSO · 46d1efd8

由 Edward Cree 提交于 11月 17, 2016

It gives no advantage over GSO now that xmit_more exists. If we find
ourselves unable to handle a TSO skb (because our TXQ doesn't have a
TSOv2 context and the NIC doesn't support TSOv1), hand it back to GSO.
Also do that if the TSO handler fails with EINVAL for any other reason.
As Falcon-architecture NICs don't support any firmware-assisted TSO,
they no longer advertise TSO feature flags at all.
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46d1efd8

sfc: handle failure to allocate TSOv2 contexts · e638ee1d

由 Edward Cree 提交于 11月 17, 2016

If we fail to init the TXQ because of insufficient TSOv2 contexts,
try again with TSOv2 disabled.
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e638ee1d

sfc: Firmware-Assisted TSO version 2 · e9117e50

由 Bert Kenward 提交于 11月 17, 2016

Add support for FATSOv2 to the driver. FATSOv2 offloads far more of the task
 of TCP segmentation to the firmware, such that we now just pass a single
 super-packet to the NIC. This means TSO has a great deal in common with a
 normal DMA transmit, apart from adding a couple of option descriptors.
 NIC-specific checks have been moved off the fast path and in to
 initialisation where possible.

This also moves FATSOv1/SWTSO to a new file (tx_tso.c).  The end of transmit
 and some error handling is now outside TSO, since it is common with other
 code.
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e9117e50

sfc: Update EF10 register definitions · e17705c4

由 Edward Cree 提交于 11月 17, 2016

Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e17705c4

sfc: Update MCDI protocol definitions · ece0cc17

由 Edward Cree 提交于 11月 17, 2016

Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ece0cc17

18 11月, 2016 4 次提交

netns: make struct pernet_operations::id unsigned int · c7d03a00

由 Alexey Dobriyan 提交于 11月 17, 2016

Make struct pernet_operations::id unsigned.

There are 2 reasons to do so:

1)
This field is really an index into an zero based array and
thus is unsigned entity. Using negative value is out-of-bound
access by definition.

2)
On x86_64 unsigned 32-bit data which are mixed with pointers
via array indexing or offsets added or subtracted to pointers
are preffered to signed 32-bit data.

"int" being used as an array index needs to be sign-extended
to 64-bit before being used.

	void f(long *p, int i)
	{
		g(p[i]);
	}

  roughly translates to

	movsx	rsi, esi
	mov	rdi, [rsi+...]
	call 	g

MOVSX is 3 byte instruction which isn't necessary if the variable is
unsigned because x86_64 is zero extending by default.

Now, there is net_generic() function which, you guessed it right, uses
"int" as an array index:

	static inline void *net_generic(const struct net *net, int id)
	{
		...
		ptr = ng->ptr[id - 1];
		...
	}

And this function is used a lot, so those sign extensions add up.

Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
messing with code generation):

	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

Unfortunately some functions actually grow bigger.
This is a semmingly random artefact of code generation with register
allocator being used differently. gcc decides that some variable
needs to live in new r8+ registers and every access now requires REX
prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
used which is longer than [r8]

However, overall balance is in negative direction:

	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
	function                                     old     new   delta
	nfsd4_lock                                  3886    3959     +73
	tipc_link_build_proto_msg                   1096    1140     +44
	mac80211_hwsim_new_radio                    2776    2808     +32
	tipc_mon_rcv                                1032    1058     +26
	svcauth_gss_legacy_init                     1413    1429     +16
	tipc_bcbase_select_primary                   379     392     +13
	nfsd4_exchange_id                           1247    1260     +13
	nfsd4_setclientid_confirm                    782     793     +11
		...
	put_client_renew_locked                      494     480     -14
	ip_set_sockfn_get                            730     716     -14
	geneve_sock_add                              829     813     -16
	nfsd4_sequence_done                          721     703     -18
	nlmclnt_lookup_host                          708     686     -22
	nfsd4_lockt                                 1085    1063     -22
	nfs_get_client                              1077    1050     -27
	tcf_bpf_init                                1106    1076     -30
	nfsd4_encode_fattr                          5997    5930     -67
	Total: Before=154856051, After=154854321, chg -0.00%
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c7d03a00

udp: enable busy polling for all sockets · e68b6e50

由 Eric Dumazet 提交于 11月 16, 2016

UDP busy polling is restricted to connected UDP sockets.

This is because sk_busy_loop() only takes care of one NAPI context.

There are cases where it could be extended.

1) Some hosts receive traffic on a single NIC, with one RX queue.

2) Some applications use SO_REUSEPORT and associated BPF filter
   to split the incoming traffic on one UDP socket per RX
queue/thread/cpu

3) Some UDP sockets are used to send/receive traffic for one flow, but
they do not bother with connect()

This patch records the napi_id of first received skb, giving more
reach to busy polling.

Tested:

lpaa23:~# echo 70 >/proc/sys/net/core/busy_read
lpaa24:~# echo 70 >/proc/sys/net/core/busy_read

lpaa23:~# for f in `seq 1 10`; do ./super_netperf 1 -H lpaa24 -t UDP_RR -l 5; done

Before patch :
   27867   28870   37324   41060   41215
   36764   36838   44455   41282   43843
After patch :
   73920   73213   70147   74845   71697
   68315   68028   75219   70082   73707
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e68b6e50

Merge branch 'rds-ha-failover-fixes' · fcd2b0da

由 David S. Miller 提交于 11月 17, 2016

Sowmini Varadhan says:

====================
RDS: TCP: HA/Failover fixes

This series contains a set of fixes for bugs exposed when
we ran the following in a loop between a test machine pair:

 while (1); do
   # modprobe rds-tcp on test nodes
   # run rds-stress in bi-dir mode between test machine pair
   # modprobe -r rds-tcp on test nodes
 done

rds-stress in bi-dir mode will cause both nodes to initiate
RDS-TCP connections at almost the same instant, exposing the
bugs fixed in this series.

Without the fixes, rds-stress reports sporadic packet drops,
and packets arriving out of sequence. After the fixes,we have
been able to run the  test overnight, without any issues.

Each patch has a detailed description of the root-cause fixed
by the patch.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fcd2b0da

RDS: TCP: Force every connection to be initiated by numerically smaller IP address · 1a0e100f

由 Sowmini Varadhan 提交于 11月 16, 2016

When 2 RDS peers initiate an RDS-TCP connection simultaneously,
there is a potential for "duelling syns" on either/both sides.
See commit 241b2719 ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()") for a description of this
condition, and the arbitration logic which ensures that the
numerically large IP address in the TCP connection is bound to the
RDS_TCP_PORT ("canonical ordering").

The rds_connection should not be marked as RDS_CONN_UP until the
arbitration logic has converged for the following reason. The sender
may start transmitting RDS datagrams as soon as RDS_CONN_UP is set,
and since the sender removes all datagrams from the rds_connection's
cp_retrans queue based on TCP acks. If the TCP ack was sent from
a tcp socket that got reset as part of duel aribitration (but
before data was delivered to the receivers RDS socket layer),
the sender may end up prematurely freeing the datagram, and
the datagram is no longer reliably deliverable.

This patch remedies that condition by making sure that, upon
receipt of 3WH completion state change notification of TCP_ESTABLISHED
in rds_tcp_state_change, we mark the rds_connection as RDS_CONN_UP
if, and only if, the IP addresses and ports for the connection are
canonically ordered. In all other cases, rds_tcp_state_change will
force an rds_conn_path_drop(), and rds_queue_reconnect() on
both peers will restart the connection to ensure canonical ordering.

A side-effect of enforcing this condition in rds_tcp_state_change()
is that rds_tcp_accept_one_path() can now be refactored for simplicity.
It is also no longer possible to encounter an RDS_CONN_UP connection in
the arbitration logic in rds_tcp_accept_one().
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a0e100f