提交 · 56ade8fe3fe1e134783f61d164305107ae01030f · openeuler / raspberrypi-kernel

16 10月, 2015 40 次提交

mlxsw: spectrum: Add initial support for Spectrum ASIC · 56ade8fe

由 Jiri Pirko 提交于 10月 16, 2015

Add support for new generation Mellanox Spectrum ASIC, 10/25/40/50 and
100Gb/s Ethernet Switch.

The initial driver implements bridge forwarding offload including
bridge internal VLAN support, FDB static entries, FDB learning and
HW ageing including their setup.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NElad Raz <eladr@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56ade8fe

mlxsw: reg: Add Switch Port VLAN MAC Learning register definition · a4feea74

由 Ido Schimmel 提交于 10月 16, 2015

Since we currently do not support the offloading of 802.1D bridges, we
need to be able to let the device know it should not learn MAC addresses
on specific {Port, VID} pairs.

Add the SPVMLR register, which controls the learning enablement of
{Port, VID} pairs.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a4feea74

mlxsw: reg: Add Switch Filtering Database Aging Time register definition · e534a56a

由 Jiri Pirko 提交于 10月 16, 2015

Add SFDAT which is used to control switch ageing time.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e534a56a

mlxsw: reg: Add Switch Virtual-Port Enabling register definition · 1f65da74

由 Ido Schimmel 提交于 10月 16, 2015

In order for a port to support {Port, VID} to FID mapping it needs to be
configured to a virtual port mode (as opposed to VLAN mode).

Add the SVPE register, which enables port virtualization.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f65da74

mlxsw: reg: Add Switch VID to FID Allocation register definition · 64790239

由 Ido Schimmel 提交于 10月 16, 2015

An incoming packet can be classified into a filtering identifer (FID)
based on its VID or incoming port and VID ({Port, VID}).

Add the SVFA register, which controls this mapping.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64790239

mlxsw: reg: Add Switch FID Management register definition · f1fb693a

由 Ido Schimmel 提交于 10月 16, 2015

Filtering identifiers (FIDs) are unique identifers of bridge instances
in the hardware.

Add the SFMR register, which is responsible for the creation and
configuration of these FIDs.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1fb693a

mlxsw: reg: Add shared buffer configuration registers definitions · e0594369

由 Jiri Pirko 提交于 10月 16, 2015

Add definitions of SBPR, SBCM, SBPM, SBMM and PBMC registers that are
used to configure shared buffers.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0594369

mlxsw: reg: Add Switch Port VID and Switch Port VLAN Membership registers definitions · b2e345f9

由 Elad Raz 提交于 10月 16, 2015

Add SPVID and SPVM registers responsible for default port VID
configuration and VLAN membership of a port.
Signed-off-by: NElad Raz <eladr@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2e345f9

mlxsw: reg: Add Switch FDB Notification register definition · f5d88f58

由 Jiri Pirko 提交于 10月 16, 2015

Add SFN register which is used to poll for newly added and aged-out FDB
entries.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5d88f58

mlxsw: reg: Add Switch Filtering Database register definition · 236033b3

由 Jiri Pirko 提交于 10月 16, 2015

Add the SFD register which is responsible for filtering database
manipulation, including static and dynamic FDB entries.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

236033b3

mlxsw: item: Add MLXSW_ITEM_BUF_INDEXED helper · d64b1592

由 Jiri Pirko 提交于 10月 16, 2015

Add missing item helper which allows to access char bufs on multiple
offsets. This is needed by SFD and SFN register definitions.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d64b1592

J
mlxsw: item: Make src arg of memcpy_to helper const · 7b0989b5
由 Jiri Pirko 提交于 10月 16, 2015
```
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
7b0989b5

mlxsw: cmd: Introduce FID-offset flooding tables · 12fd35ab

由 Ido Schimmel 提交于 10月 16, 2015

Packets destined to offloaded netdevs will be classified to FIDs in the
device and flooded in case of BUM.

The flooding table used is of type FID-offset, which allows one to
create different flooding domains for different FIDs and specify the
offset in the flooding table for each FID (not necessarily equal to FID
or VID).

Add support for this flooding table type, by exposing the configuration
of the number of tables from this type and their size.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12fd35ab

mlxsw: cmd: Introduce per-FID flooding tables · 453b6a8d

由 Ido Schimmel 提交于 10月 16, 2015

In the newly introduced Spectrum switch ASIC, packets destined to not
offloaded netdevs will be classified to special FIDs (vFIDs) in the
device and flooded to the CPU port.

The flooding table used is of type per-FID, which allows one to create
different flooding domains for different vFIDs.

While using a simple single-entry flood table is certainly sufficient at
this point, we do plan to offload 802.1D bridges involving VLAN
interfaces, thus making this change necessary.

Add support for this flooding table type, by exposing the configuration
of the number of tables from this type and their size.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

453b6a8d

mlxsw: Enable configuration of flooding domains · bc2055f8

由 Ido Schimmel 提交于 10月 16, 2015

As part of the introduction of L2 offloads, allow different ports to
join/leave the flooding domain, according to user configuration.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc2055f8

net: introduce pre-change upper device notifier · 573c7ba0

由 Jiri Pirko 提交于 10月 16, 2015

This newly introduced netdevice notifier is called before actual change
upper happens. That provides a possibility for notifier handlers to
know upper change will happen and react to it, including possibility to
forbid the change. That is valuable for drivers which can check if the
upper device linkage is supported and forbid that in case it is not.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

573c7ba0

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 125ecf4b

由 David S. Miller 提交于 10月 16, 2015

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-10-16

This series contains updates to e1000, e1000e, igb, igbvf, ixgbe, ixgbevf,
i40e, i40evf and fm10k.

Alex Duyck fixes the polling routine for i40e/i40evf were the NAPI budget
for receive cleanup was being rounded up to 1 but the netpoll call was
expecting no Rx to be processed as the budget passed was 0.  Also cleaned
up IN_NETPOLL flag that was not adding any value due to the receive
cleanup was handled in NAPI.  Added support for netpoll for i40evf as
well.

Jesse updates all of our drivers to use napi_complete_done() instead of
napi_complete(), which allows us to use
/sys/class/net/ethX/gro_flush_timeout.  Added ethtool support to control
and report the new Interrupt Limit register, since the XL710 hardware
has a different interrupt moderation design that can support a limit of
total interrupts per second per vector.

Shannon cleans up startup log entries to cut down the number by putting
a couple behind debug flags and combining others into single line.  Added
support to enable/disable printing VEB statistics via ethtool.

Jingjing fixes a compile issue by adding const to functions that return
strings that are not going to be modified.

Greg Rose cleans up defines that were not used and were causing customer
confusion.

Greg Bowers adds support for setting a new bit in the Set Local LLDP MIB
admin queue command Type field.

Mitch fixes an issue where vlan_features field was set to the same value
as netdev features field, but before the features were actually being
set up, leaving the vlan_features empty.  Resolve the issue by setting
up the netdev features first, then mask out the VLAN feature bits when
assigning vlan_features.  Fixed VF init timing, where in some instances
the VFs would fail to initialize the first time you loaded the driver.
To correct this, increased the delay time for the init task and wait
longer before giving up.

v2: fix missing space in function header comment in patch 3, based on
    feedback from Sergei Shtylyov.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

125ecf4b

i40e/i40evf: Bump i40e to 1.3.34 and i40evf to 1.3.21 · d1d39516

由 Catherine Sullivan 提交于 9月 28, 2015

Bump.

Change-ID: I7ec818a507554648675b9b245ced9e6b6bd9ed4e
Signed-off-by: NCatherine Sullivan <catherine.sullivan@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

d1d39516

i40e: increase AQ work limit · 628f096d

由 Mitch Williams 提交于 9月 28, 2015

With 64 VFs, we can easily overwhelm the AQ on the PF if we have too low
a limit on the number of AQ requests. This leads to ARQ overflow errors,
and occasionally VFs that fail to initialize.

Since we really only hit this condition on initial VF driver load, the
requests that we process are lightweight, so this extra work doesn't
cause problems for the PF driver.

Change-ID: I620221520d8af987df6ace9ba938ffaf22107681
Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

628f096d

i40evf: relax and stagger init timing a bit · 3f7e5c33

由 Mitch Williams 提交于 9月 28, 2015

On some devices, in some systems, in some configurations, the VFs would
fail to initialize the first time you loaded the driver.

To correct this, increase the delay time for the init task slightly, and
wait longer before giving up.

If we enable VFs and load the VF driver in the same kernel as the PF
driver, we can totally overwhelm the PF driver with AQ requests because
all of the instances try to initialize at the same time.

To help alleviate this, stagger the initial scheduling of the init task
using the PCIe function as a multiplier. We mask off the function to
only three bits so no instance has to wait too long.

With these two changes, initializing 128 VFs on a single device goes
from four minutes to just a few seconds.

Change-ID: If3d8720c1c4e838ab36d8781d9ec295a62380936
Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

3f7e5c33

i40e: Recognize 1000Base_T_Optical phy type when link is up · 48becae6

由 Catherine Sullivan 提交于 9月 28, 2015

1000Base_T_Optical got added to the function that figures out what
is supported when link is down but not when link is up. Add it in there
too so that we display the correct information.

Change-ID: I85ebcdfa7c02d898c44c673b1500552a53c8042e
Signed-off-by: NCatherine Sullivan <catherine.sullivan@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

48becae6

i40evf: correctly populate vlan_features · cc7e406c

由 Mitch Williams 提交于 9月 28, 2015

The vlan_features field was correctly being set to the same value as the
netdev features field. However, this was being done before the features
were actually being set up, leaving the vlan_features empty.

Also, after a reset, vlan_features will be incorrectly assigned the
previous netdev feature flags, which can contain VLAN feature bits. This
makes the VLAN code angry and will cause a stack dump.

To fix these issues, set up the netdev features first, then mask out the
VLAN feature bits when assigning vlan_features.

Change-ID: Ib0548869dc83cf6a841cb8697dd94c12359ba4d2
Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

cc7e406c

i40e: reset the invalid msg counter in vf when a valid msg is received · 5d38c93e

由 Jingjing Wu 提交于 9月 28, 2015

When the number of invalid messages from a VF is exceeded, the VF
will be disabled, due to the invalid messages.  This happens if
other VF drivers (like DPDK) send a message through the driver's
mailbox (aka virtchannel) interface, but the message is not
supported by the i40e pf driver, such as CONFIG_PROMISCUOUS_MODE.

This patch changes the num_invalid_msgs in struct i40e_vf to record
the continuous invalid msgs, and it will be reset when a valid msg
is received.

Change-ID: Iaec42fd3dcdd281476b3518be23261dd46fc3718
Signed-off-by: NJingjing Wu <jingjing.wu@intel.com>
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

5d38c93e

i40e/i40evf: moderate interrupts differently · ac26fc13

由 Jesse Brandeburg 提交于 9月 28, 2015

The XL710 hardware has a different interrupt moderation design
that can support a limit of total interrupts per second per
vector, in addition to the "number of interrupts per second"
controls already established in the driver.  This combination
of hardware features allows us to set very low default latency
settings but minimize the total CPU utilization by not
making too many interrupts, should the user desire.

The current driver implementation is still enabling the dynamic
moderation in the driver, and only using the rx/tx-usecs
limit in ethtool to limit the interrupt rate per second, by default.

The new code implemented in this patch
2) adds init/use of the new "Interrupt Limit" register
3) adds ethtool knob to control/report the limits above

Usage is ethtool -C ethx rx-usecs-high <value> Where <value> is number
of microseconds to create a rate of 1/N interrupts per second,
regardless of rx-usecs or tx-usecs values. Since there is a credit based
scheme in the hardware, the rx-usecs and tx-usecs can be configured for
very low latency for short bursts, but once the credit runs out the
refill rate on the credits is limited by rx-usecs-high.

Change-ID: I3a1075d3296123b0f4f50623c779b027af5b188d
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

ac26fc13

i40e: Add support for non-willing Apps · 947570e8

由 Greg Bowers 提交于 9月 28, 2015

Adds support for setting a new bit in the Set Local LLDP MIB AQ command
Type field.  When set to 1, the bit indicates to FW that Apps should be
treated as non-willing.  When 0, FW behaves as before.

Change-ID: I0d2101c1606c59c7188d3e6a0c7810e0f205233a
Signed-off-by: NGreg Bowers <gregory.j.bowers@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

947570e8

i40e: priv flag for controlling VEB stats · 1cdfd88f

由 Shannon Nelson 提交于 9月 28, 2015

Add an ethtool priv flag to enable and disable printing
the VEB statistics.

Change-ID: I7654054a3a73b08aa8310d94ee8fce6219107dd8
Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

1cdfd88f

i40e: Removed unused defines · d9d17cf7

由 Greg Rose 提交于 9月 28, 2015

Two defines that are not used are causing customer confusion - remove
them.

Change-ID: Icef0325aca8e0f4fcdfc519e026bdd375e791200
Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

d9d17cf7

i40e: remove read/write failed messages from nvmupdate · 3c5c4205

由 Shannon Nelson 提交于 9月 28, 2015

Allow the nvmupdate application to decide when a read or write error
should be exposed to the user. Since the application needs to use
write probes to find the ReadOnly sections on a potentially unknown NVM
version in the HW and read probes to check the status of the last write,
some error messages are expected, but need not be shown to the users.
The driver doesn't know which are ignorable from real errors, so needs
to let the application make the decision.

Change-ID: I78fca8ab672bede11c10c820b83c26adfd536d03
Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

3c5c4205

i40e/i40evf: Fix compile issue related to const string · 4e68adfe

由 Jingjing Wu 提交于 9月 28, 2015

Add const to functions that return strings that aren't going to be
modified. This addresses some reported compile complaints.

Change-ID: Ic56b1e814ab4d23a50480e7fdec652445f776ee8
Signed-off-by: NJingjing Wu <jingjing.wu@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

4e68adfe

i40e: generate fewer startup messages · 6dec1017

由 Shannon Nelson 提交于 9月 28, 2015

Cut down on the number of startup log entries by putting a couple behind
debug flags and combining a couple others into a single line.

Change-ID: I708089f086308f84d43f8b6f0e8a634a02d058fb
Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

6dec1017

drivers/net/intel: use napi_complete_done() · 32b3e08f

由 Jesse Brandeburg 提交于 9月 24, 2015

As per Eric Dumazet's previous patches:
(see commit (24d2e4a5) - tg3: use napi_complete_done())

Quoting verbatim:
Using napi_complete_done() instead of napi_complete() allows
us to use /sys/class/net/ethX/gro_flush_timeout

GRO layer can aggregate more packets if the flush is delayed a bit,
without having to set too big coalescing parameters that impact
latencies.
</end quote>

Tested
configuration: low latency via ethtool -C ethx adaptive-rx off
				rx-usecs 10 adaptive-tx off tx-usecs 15
workload: streaming rx using netperf TCP_MAERTS

igb:
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET : demo
...
Interim result:  941.48 10^6bits/s over 1.000 seconds ending at 1440193171.589

Alignment      Offset         Bytes    Bytes       Recvs   Bytes    Sends
Local  Remote  Local  Remote  Xfered   Per                 Per
Recv   Send    Recv   Send             Recv (avg)          Send (avg)
    8       8      0       0 1176930056  1475.36    797726   16384.00  71905

MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET : demo
...
Interim result:  941.49 10^6bits/s over 0.997 seconds ending at 1440193142.763

Alignment      Offset         Bytes    Bytes       Recvs   Bytes    Sends
Local  Remote  Local  Remote  Xfered   Per                 Per
Recv   Send    Recv   Send             Recv (avg)          Send (avg)
    8       8      0       0 1175182320  50476.00     23282   16384.00  71816

i40e:
Hard to test because the traffic is incoming so fast (24Gb/s) that GRO
always receives 87kB, even at the highest interrupt rate.

Other drivers were only compile tested.
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

32b3e08f

i40evf: Add support for netpoll · 7709b4c1

由 Alexander Duyck 提交于 9月 24, 2015

Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

7709b4c1

i40e/i40evf: Drop useless "IN_NETPOLL" flag · 8b650359

由 Alexander Duyck 提交于 9月 24, 2015

The code in i40e and i40evf is using an "IN_NETPOLL" flag that has never
added any value due to the fact that the Rx clean-up is handled in NAPI.
As such the flag was set, the queue was scheduled via NAPI, and then polled
from the netpoll controller and if any Rx packets were processed the were
processed in the wrong context.

In addition the flag itself just added an unneeded conditional to the
hot-path so it can safely be dropped and save us a few instructions.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

8b650359

i40e/i40evf: Fix handling of napi budget · c67caceb

由 Alexander Duyck 提交于 9月 24, 2015

The polling routine for i40e was rounding up the budget for Rx cleanup to
1.  This is incorrect as the netpoll poll call is expecting no Rx to be
processed as the budget passed was 0.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

c67caceb

net: Fix suspicious RCU usage in fib_rebalance · 51161aa9

由 David Ahern 提交于 10月 14, 2015

This command:
  ip route add 192.168.1.0/24 nexthop via 10.2.1.5 dev eth1 nexthop via 10.2.2.5 dev eth2

generated this suspicious RCU usage message:

[ 63.249262]
[ 63.249939] ===============================
[ 63.251571] [ INFO: suspicious RCU usage. ]
[ 63.253250] 4.3.0-rc3+ #298 Not tainted
[ 63.254724] -------------------------------
[ 63.256401] ../include/linux/inetdevice.h:205 suspicious rcu_dereference_check() usage!
[ 63.259450]
[ 63.259450] other info that might help us debug this:
[ 63.259450]
[ 63.262297]
[ 63.262297] rcu_scheduler_active = 1, debug_locks = 1
[ 63.264647] 1 lock held by ip/2870:
[ 63.265896] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff813ebfb7>] rtnl_lock+0x12/0x14
[ 63.268858]
[ 63.268858] stack backtrace:
[ 63.270409] CPU: 4 PID: 2870 Comm: ip Not tainted 4.3.0-rc3+ #298
[ 63.272478] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 63.275745] 0000000000000001 ffff8800b8c9f8b8 ffffffff8125f73c ffff88013afcf301
[ 63.278185] ffff8800bab7a380 ffff8800b8c9f8e8 ffffffff8107bf30 ffff8800bb728000
[ 63.280634] ffff880139fe9a60 0000000000000000 ffff880139fe9a00 ffff8800b8c9f908
[ 63.283177] Call Trace:
[ 63.283959] [<ffffffff8125f73c>] dump_stack+0x4c/0x68
[ 63.285593] [<ffffffff8107bf30>] lockdep_rcu_suspicious+0xfa/0x103
[ 63.287500] [<ffffffff8144d752>] __in_dev_get_rcu+0x48/0x4f
[ 63.289169] [<ffffffff8144d797>] fib_rebalance+0x3e/0x127
[ 63.290753] [<ffffffff8144d986>] ? rcu_read_unlock+0x3e/0x5f
[ 63.292442] [<ffffffff8144ea45>] fib_create_info+0xaf9/0xdcc
[ 63.294093] [<ffffffff8106c12f>] ? sched_clock_local+0x12/0x75
[ 63.295791] [<ffffffff8145236a>] fib_table_insert+0x8c/0x451
[ 63.297493] [<ffffffff8144bf9c>] ? fib_get_table+0x36/0x43
[ 63.299109] [<ffffffff8144c3ca>] inet_rtm_newroute+0x43/0x51
[ 63.300709] [<ffffffff813ef684>] rtnetlink_rcv_msg+0x182/0x195
[ 63.302334] [<ffffffff8107d04c>] ? trace_hardirqs_on+0xd/0xf
[ 63.303888] [<ffffffff813ebfb7>] ? rtnl_lock+0x12/0x14
[ 63.305346] [<ffffffff813ef502>] ? __rtnl_unlock+0x12/0x12
[ 63.306878] [<ffffffff81407c4c>] netlink_rcv_skb+0x3d/0x90
[ 63.308437] [<ffffffff813ec00e>] rtnetlink_rcv+0x21/0x28
[ 63.309916] [<ffffffff81407742>] netlink_unicast+0xfa/0x17f
[ 63.311447] [<ffffffff81407a5e>] netlink_sendmsg+0x297/0x2dc
[ 63.313029] [<ffffffff813c6cd4>] sock_sendmsg_nosec+0x12/0x1d
[ 63.314597] [<ffffffff813c835b>] ___sys_sendmsg+0x196/0x21b
[ 63.316125] [<ffffffff8100bf9f>] ? native_sched_clock+0x1f/0x3c
[ 63.317671] [<ffffffff8106c12f>] ? sched_clock_local+0x12/0x75
[ 63.319185] [<ffffffff8106c397>] ? sched_clock_cpu+0x9d/0xb6
[ 63.320693] [<ffffffff8107e2d7>] ? __lock_is_held+0x32/0x54
[ 63.322145] [<ffffffff81159fcb>] ? __fget_light+0x4b/0x77
[ 63.323541] [<ffffffff813c8726>] __sys_sendmsg+0x3d/0x5b
[ 63.324947] [<ffffffff813c8751>] SyS_sendmsg+0xd/0x19
[ 63.326274] [<ffffffff814c8f57>] entry_SYSCALL_64_fastpath+0x12/0x6f

It looks like all of the code paths to fib_rebalance are under rtnl.

Fixes: 0e884c78 ("ipv4: L3 hash-based multipath")
Cc: Peter Nørlund <pch@ordbogen.com>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51161aa9

bpf: Need to call bpf_prog_uncharge_memlock from bpf_prog_put · ac00737f

由 Tom Herbert 提交于 10月 14, 2015

Currently, is only called from __prog_put_rcu in the bpf_prog_release
path. Need this to call this from bpf_prog_put also to get correct
accounting.

Fixes: aaac3ba9 ("bpf: charge user for creation of BPF maps and programs")
Signed-off-by: NTom Herbert <tom@herbertland.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac00737f

Merge branch 'robust_listener' · a302afe9

由 David S. Miller 提交于 10月 16, 2015

Eric Dumazet says:

====================
tcp/dccp: make our listener code more robust

This patch series addresses request sockets leaks and listener dismantle
phase. This survives a stress test with listeners being added/removed
quite randomly.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a302afe9

tcp/dccp: fix race at listener dismantle phase · ebb516af

由 Eric Dumazet 提交于 10月 14, 2015

Under stress, a close() on a listener can trigger the
WARN_ON(sk->sk_ack_backlog) in inet_csk_listen_stop()

We need to test if listener is still active before queueing
a child in inet_csk_reqsk_queue_add()

Create a common inet_child_forget() helper, and use it
from inet_csk_reqsk_queue_add() and inet_csk_listen_stop()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ebb516af

tcp/dccp: add inet_csk_reqsk_queue_drop_and_put() helper · f03f2e15

由 Eric Dumazet 提交于 10月 14, 2015

Let's reduce the confusion about inet_csk_reqsk_queue_drop() :
In many cases we also need to release reference on request socket,
so add a helper to do this, reducing code size and complexity.

Fixes: 4bdc3d66 ("tcp/dccp: fix behavior of stale SYN_RECV request sockets")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f03f2e15

Revert "inet: fix double request socket freeing" · ef84d8ce

由 Eric Dumazet 提交于 10月 14, 2015

This reverts commit c6973669.

At the time of above commit, tcp_req_err() and dccp_req_err()
were dead code, as SYN_RECV request sockets were not yet in ehash table.

Real bug was fixed later in a different commit.

We need to revert to not leak a refcount on request socket.

inet_csk_reqsk_queue_drop_and_put() will be added
in following commit to make clean inet_csk_reqsk_queue_drop()
does not release the reference owned by caller.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef84d8ce