提交 · 20ca7fb6e4d6e8c44442ac0b82c24e74de2bad02 · openeuler / raspberrypi-kernel

01 4月, 2015 4 次提交

ptp: blackfin: convert to the 64 bit get/set time methods. · 20ca7fb6

由 Richard Cochran 提交于 3月 29, 2015

The device uses 64 bit nanoseconds register, and so with this patch the
driver is ready for the year 2038.
Signed-off-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

20ca7fb6

ptp: use the 64 bit get/set time methods for the posix clock. · d7d38f5b

由 Richard Cochran 提交于 3月 29, 2015

This patch changes the posix clock code to prefer the new methods
whenever they are implemented by the PHC drivers.
Signed-off-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d7d38f5b

ptp: use the 64 bit gettime method for the SYS_OFFSET ioctl. · e13cfcb0

由 Richard Cochran 提交于 3月 29, 2015

This patch changes the code to use the new method whenever implemented by
the PHC driver.
Signed-off-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e13cfcb0

ptp: introduce get/set time methods with explicit 64 bit seconds. · 92f17194

由 Richard Cochran 提交于 3月 29, 2015

Converting the PHC drivers over to the new methods is one step along the
way to making them ready for 2038.  Once all the drivers are up to date,
then the old methods will be removed.
Signed-off-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92f17194

30 3月, 2015 36 次提交

tipc: fix two bugs in secondary destination lookup · d482994f

由 Jon Paul Maloy 提交于 3月 27, 2015

A message sent to a node after a successful name table lookup may still
find that the destination socket has disappeared, because distribution
of name table updates is non-atomic. If so, the message will be rejected
back to the sender with error code TIPC_ERR_NO_PORT. If the source
socket of the message has disappeared in the meantime, the message
should be dropped.

However, in the currrent code, the message will instead be subject to an
unwanted tertiary lookup, because the function tipc_msg_lookup_dest()
doesn't check if there is an error code present in the message before
performing the lookup. In the worst case, the message may now find the
old destination again, and be redirected once more, instead of being
dropped directly as it should be.

A second bug in this function is that the "prev_node" field in the message
is not updated after successful lookup, something that may have
unpredictable consequences.

The problems arising from those bugs occur very infrequently.

The third change in this function; the test on msg_reroute_msg_cnt() is
purely cosmetic, reflecting that the returned value never can be negative.

This commit corrects the two bugs described above.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d482994f

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 4d92a3e9

由 David S. Miller 提交于 3月 29, 2015

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-03-27

This series contains updates to i40e and i40evf.

Jesse adds new device IDs to handle the new 20G speed for KR2.

Mitch provides a fix for an issue that shows up as a panic or memory
corruption when the device is brought down while under heavy stress.
This is resolved by delaying the releasing of resources until we
receive acknowledgment from the PF driver that the rings have indeed
been stopped.  Also adds firmware version information to ethtool
reporting to align with ixgbevf behavior.

Akeem increases the polling loop limiter, sine we found that in
certain circumstances the firmware can take longer to be ready after
a reset.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4d92a3e9

Merge branch 'stacked_vlan_tso' · afb0bc97

由 David S. Miller 提交于 3月 29, 2015

Toshiaki Makita says:

====================
Stacked vlan TSO

On the basis of Netdev 0.1 discussion[1], I made a patch set to enable
TSO for packets with multiple vlans.

Currently, packets with multiple vlans are always segmented by software,
which is caused by that netif_skb_features() drops most feature flags
for multiple tagged packets.

To allow NICs to segment them, we need to get rid of that check from core.
Fortunately, recently introduced ndo_features_check() can be used to
move the check to each driver, and this patch set is based on the idea.

For the initial patch set, I chose 3 drivers, bonding, team, and igb, as
candidates to enable TSO. I tested them and confirmed they works fine
with this change.

Here are samples of performance test results. As I expected, %sys gets
pretty lower than before.

* TEST1: vlan (.1Q) on vlan (.1ad) on igb (I350)

- before

$ netperf -t TCP_STREAM -H 192.168.10.1 -l 60
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    60.02     933.72

Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all      0.13      0.00     11.28      0.01      0.00     88.58

- after

$ netperf -t TCP_STREAM -H 192.168.10.1 -l 60
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    60.01     936.13

Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all      0.24      0.00      4.17      0.01      0.00     95.58

* TEST2: vlan (.1Q) on bridge (.1ad vlan filtering) on team on igb (I350)

- before

$ netperf -t TCP_STREAM -H 192.168.10.1 -l 60
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    60.01     936.28

Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all      0.41      0.00     11.57      0.01      0.00     88.01

- after

$ netperf -t TCP_STREAM -H 192.168.10.1 -l 60
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    60.02     935.72

Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all      0.14      0.00      7.66      0.01      0.00     92.19

In addition to above, I tested these configurations:
- vlan (.1Q) on vlan (1.ad) on bonding on igb (I350)
- vlan (.1Q) on vlan (1.Q) on igb (I350)
- vlan (.1Q) on vlan (1.Q) on team on igb (I350)
And didn't find any problem.

[1] https://netdev01.org/sessions/18
    https://netdev01.org/docs/netdev01_bof_8021ad_makita_150212.pdf
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

afb0bc97

igb: Enable TSO for stacked vlan · 1abbc98a

由 Toshiaki Makita 提交于 3月 27, 2015

As datasheets for igb (I210, I350, 82576, etc.) say, maclen can be from
14 to 127, which is enough for reasonable number of vlan tags.
My netperf test showed I350's TSO works pretty fine with multiple vlans.
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1abbc98a

team: Don't segment multiple tagged packets on team device · b9f4cf75

由 Toshiaki Makita 提交于 3月 27, 2015

Team devices don't need to segment multiple tagged packets since their
slaves can segment them.
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9f4cf75

bonding: Don't segment multiple tagged packets on bonding device · 4847f049

由 Toshiaki Makita 提交于 3月 27, 2015

Bonding devices don't need to segment multiple tagged packets since their
slaves can segment them.
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4847f049

net: Introduce passthru_features_check · e38f3025

由 Toshiaki Makita 提交于 3月 27, 2015

As there are a number of (especially virtual) devices that don't
need the multiple vlan check, introduce passthru_features_check() for
convenience.
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e38f3025

net: Move check for multiple vlans to drivers · 8cb65d00

由 Toshiaki Makita 提交于 3月 27, 2015

To allow drivers to handle the features check for multiple tags,
move the check to ndo_features_check().
As no drivers currently handle multiple tagged TSO, introduce
dflt_features_check() and call it if the driver does not have
ndo_features_check().
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8cb65d00

vlan: Introduce helper functions to check if skb is tagged · f5a7fb88

由 Toshiaki Makita 提交于 3月 27, 2015

Separate the two checks for single vlan and multiple vlans in
netif_skb_features().  This allows us to move the check for multiple
vlans to another function later.
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5a7fb88

vlan: Add features for stacked vlan device · 8d463504

由 Toshiaki Makita 提交于 3月 27, 2015

Stacked vlan devices curretly have few features (GRO, HIGHDMA, LLTX).
Since we have software fallbacks in case the NIC can not handle some
features for multiple vlans, we can add the same features as the lower
vlan devices for stacked vlan devices.

This allows stacked vlan devices to create large (GSO) packets and not to
segment packets. Those packets will be segmented by software on the real
device, or even can be segmented by the NIC once TSO for multiple vlans
becomes enabled by the following patches.

The exception is those related to FCoE, which does not have a software
fallback.
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d463504

tc: bpf: generalize pedit action · 608cd71a

由 Alexei Starovoitov 提交于 3月 26, 2015

existing TC action 'pedit' can munge any bits of the packet.
Generalize it for use in bpf programs attached as cls_bpf and act_bpf via
bpf_skb_store_bytes() helper function.
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Reviewed-by: NJiri Pirko <jiri@resnulli.us>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

608cd71a

Merge branch 'dsa-hw-bridging' · 7836b16c

由 David S. Miller 提交于 3月 29, 2015

Guenter Roeck says:

====================
net: dsa: HW bridging, EEE support

Patch 1 to 7 of this series prepare the drivers using the mv88e6xxx code
for HW bridging support, without adding the code itself. For the most part
this factors out common port initialization code. There is no functional
change except for patch 3, which disables the message port bit for the
CPU port to prevent packet duplication if HW bridging is configured.

Patch 8 adds the infrastructure for hardware bridging support to the
mv88e6xxx code.

Patch 9 wires the MV88E6352 driver to support hardware bridging.

Patches 10 to 12 add support for ndo_fdb functions to the dsa subsystem,
and wire up the MV88E6352 driver to support those functions.

Patches 13 to 16 add EEE support and HW bridging support to the mv88e6171
driver. This set of patches is from Andrew, applied on top of the first
set of patches.

The series applies to net-next as of 3/24/2015.

Thanks a lot to Andrew Lunn for testing and valuable feedback.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7836b16c

net: dsa: mv88e6171: Add support for hardware bridging · b2a6b93a

由 Andrew Lunn 提交于 3月 26, 2015

Wire up the common code for setting up hardware bridging
and access to the forwarding database.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2a6b93a

net: dsa: mv88e6171: Add EEE support to the mv88e6172 · baae51d5

由 Andrew Lunn 提交于 3月 26, 2015

The mv88e6172 has support for EEE. Check for the product ID and call
the common code if applicable.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

baae51d5

net: dsa: mv88e6171: Add defines for switch product IDs · 464caa2f

由 Andrew Lunn 提交于 3月 26, 2015

Make the code more readable by using defines for the switch IDs.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

464caa2f

net: dsa: Centralise getting switch id · a8f064c6

由 Andrew Lunn 提交于 3月 26, 2015

Get the switch id and save it away in the private mv88x6xxx structure
in a centralised piece of code, rather than each driver doing it itself.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8f064c6

net: dsa: mv88e6352: Add support for ndo_fdb functions · 4f431e56

由 Guenter Roeck 提交于 3月 26, 2015

Add support for manipulating switch fdb entries by pointing to the
ndo_fdb functions implemented for mv88e6xxxx.
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f431e56

net: dsa: mv88e6xxx: Add support for fdb_add, fdb_del, and fdb_getnext · defb05b9

由 Guenter Roeck 提交于 3月 26, 2015

No vlan support at this time.
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Tested-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

defb05b9

net: dsa: Add basic framework to support ndo_fdb functions · 339d8262

由 Guenter Roeck 提交于 3月 26, 2015

Provide callbacks for ndo_fdb_add, ndo_fdb_del, and ndo_fdb_dump.
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Tested-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

339d8262

net: dsa: mv88e6352: Add support for hardware bridging · 3f244abb

由 Guenter Roeck 提交于 3月 26, 2015

Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f244abb

net: dsa: mv88e6xxx: Add Hardware bridging support · facd95b2

由 Guenter Roeck 提交于 3月 26, 2015

Bridge support is similar for all chips supported by the mv88e6xxx code,
so add the code there.
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Tested-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

facd95b2

net: dsa: mv88e6171: Use common port configuration · b0019b70

由 Guenter Roeck 提交于 3月 26, 2015

Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Tested-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0019b70

net: dsa: mv88e6123_61_65: Use common port configuration · 54af0cf0

由 Guenter Roeck 提交于 3月 26, 2015

This will simplify adding offloaded bridge support later on.
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54af0cf0

net: dsa: mv88e6352: Use common port initialization code · 2089052f

由 Guenter Roeck 提交于 3月 26, 2015

This prepares the driver for hardware bridging.
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2089052f

net: dsa: mv88e6xxx: Split mv88e6xxx_reg_read and mv88e6xxx_reg_write · 8d6d09e7

由 Guenter Roeck 提交于 3月 26, 2015

Split mv88e6xxx_reg_read and mv88e6xxx_reg_write into two functions each,
one to acquire smi_mutex and one to get struct mii_bus *bus from
struct dsa_switch *ds and to call the actual read/write function.
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Tested-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d6d09e7

net: dsa: mv88e6xxx: Disable Message Port bit for CPU port · 366f0a0f

由 Guenter Roeck 提交于 3月 26, 2015

Datasheet says that the Message Port bit should not be set for the CPU port.
Having it set causes DSA tagged packets to be sent to the CPU port roughly
every 30 seconds. Those packets are the same as real packets forwarded between
switch ports if the switch is configured for switching between multiple ports.
The packets are then bridged by the software bridge, resulting in duplicated
packets on the network.
Reported-by: NAndrew Lunn <andrew@lunn.ch>
Cc: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Tested-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

366f0a0f

net: dsa: mv88e6xxx: Provide function for common port initialization · d827e88a

由 Guenter Roeck 提交于 3月 26, 2015

Provide mv88e6xxx_setup_port_common() for common port initialization.
Currently only write Port 1 Control and VLAN configuration since
this will be needed for hardware bridging. More can be added later
if desired/needed.
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Tested-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d827e88a

net: dsa: mv88e6xxx: Factor out common initialization code · acdaffcc

由 Guenter Roeck 提交于 3月 26, 2015

Code used and needed in mv886xxx.c should be initialized there as well,
so factor it out from the individual initialization files.
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Tested-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

acdaffcc

hv_netvsc: remove vmbus_are_subchannels_present() in rndis_filter_device_add() · 5ce58c2f

由 Haiyang Zhang 提交于 3月 26, 2015

The vmbus_are_subchannels_present() also involves opening the channels, which
may be too early at this point. Checking for subchannels is not necessary here.
So this patch removes it. Subchannels will be opened when offer messages arrive.
Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ce58c2f

net: smc91x: make use of 4th parameter to devm_gpiod_get_index · cb6e0b36

由 Uwe Kleine-König 提交于 3月 26, 2015

Since 39b2bbe3 (gpio: add flags argument to gpiod_get*() functions)
which appeared in v3.17-rc1, the gpiod_get* functions take an additional
parameter that allows to specify direction and initial value for output.
Simplify accordingly.

Moreover use devm_gpiod_get_index_optional for still simpler handling.
Signed-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cb6e0b36

hv_netvsc: Implement batching in send buffer · 7c3877f2

由 Haiyang Zhang 提交于 3月 26, 2015

With this patch, we can send out multiple RNDIS data packets in one send buffer
slot and one VMBus message. It reduces the overhead associated with VMBus messages.
Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c3877f2

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 4ef295e0

由 David S. Miller 提交于 3月 29, 2015

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next tree.
Basically, nf_tables updates to add the set extension infrastructure and finish
the transaction for sets from Patrick McHardy. More specifically, they are:

1) Move netns to basechain and use recently added possible_net_t, from
   Patrick McHardy.

2) Use LOGLEVEL_<FOO> from nf_log infrastructure, from Joe Perches.

3) Restore nf_log_trace that was accidentally removed during conflict
   resolution.

4) nft_queue does not depend on NETFILTER_XTABLES, starting from here
   all patches from Patrick McHardy.

5) Use raw_smp_processor_id() in nft_meta.

Then, several patches to prepare ground for the new set extension
infrastructure:

6) Pass object length to the hash callback in rhashtable as needed by
   the new set extension infrastructure.

7) Cleanup patch to restore struct nft_hash as wrapper for struct
   rhashtable

8) Another small source code readability cleanup for nft_hash.

9) Convert nft_hash to rhashtable callbacks.

And finally...

10) Add the new set extension infrastructure.

11) Convert the nft_hash and nft_rbtree sets to use it.

12) Batch set element release to avoid several RCU grace period in a row
    and add new function nft_set_elem_destroy() to consolidate set element
    release.

13) Return the set extension data area from nft_lookup.

14) Refactor existing transaction code to add some helper functions
    and document it.

15) Complete the set transaction support, using similar approach to what we
    already use, to activate/deactivate elements in an atomic fashion.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ef295e0

Merge branch 'tipc-next' · ae7633c8

由 David S. Miller 提交于 3月 29, 2015

Ying Xue says:

====================
tipc: fix two corner issues

The patch set aims at resolving the following two critical issues:

Patch #1: Resolve a deadlock which happens while all links are reset
Patch #2: Correct a mistake usage of RCU lock which is used to protect
          node list
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae7633c8

tipc: involve reference counter for node structure · 8a0f6ebe

由 Ying Xue 提交于 3月 26, 2015

TIPC node hash node table is protected with rcu lock on read side.
tipc_node_find() is used to look for a node object with node address
through iterating the hash node table. As the entire process of what
tipc_node_find() traverses the table is guarded with rcu read lock,
it's safe for us. However, when callers use the node object returned
by tipc_node_find(), there is no rcu read lock applied. Therefore,
this is absolutely unsafe for callers of tipc_node_find().

Now we introduce a reference counter for node structure. Before
tipc_node_find() returns node object to its caller, it first increases
the reference counter. Accordingly, after its caller used it up,
it decreases the counter again. This can prevent a node being used by
one thread from being freed by another thread.
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NJon Maloy <jon.maloy@ericson.com>
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a0f6ebe

tipc: fix potential deadlock when all links are reset · b952b2be

由 Ying Xue 提交于 3月 26, 2015

[   60.988363] ======================================================
[   60.988754] [ INFO: possible circular locking dependency detected ]
[   60.989152] 3.19.0+ #194 Not tainted
[   60.989377] -------------------------------------------------------
[   60.989781] swapper/3/0 is trying to acquire lock:
[   60.990079]  (&(&n_ptr->lock)->rlock){+.-...}, at: [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
[   60.990743]
[   60.990743] but task is already holding lock:
[   60.991106]  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
[   60.991738]
[   60.991738] which lock already depends on the new lock.
[   60.991738]
[   60.992174]
[   60.992174] the existing dependency chain (in reverse order) is:
[   60.992174]
-> #1 (&(&bclink->lock)->rlock){+.-...}:
[   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
[   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
[   60.992174]        [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
[   60.992174]        [<ffffffffa0000f57>] tipc_bclink_add_node+0x97/0xf0 [tipc]
[   60.992174]        [<ffffffffa0011815>] tipc_node_link_up+0xf5/0x110 [tipc]
[   60.992174]        [<ffffffffa0007783>] link_state_event+0x2b3/0x4f0 [tipc]
[   60.992174]        [<ffffffffa00193c0>] tipc_link_proto_rcv+0x24c/0x418 [tipc]
[   60.992174]        [<ffffffffa0008857>] tipc_rcv+0x827/0xac0 [tipc]
[   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
[   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
[   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
[   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
[   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
[   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
[   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
[   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
[   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
[   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
[   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
[   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
[   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
[   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
[   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
[   60.992174]
-> #0 (&(&n_ptr->lock)->rlock){+.-...}:
[   60.992174]        [<ffffffff810a8f7d>] __lock_acquire+0x163d/0x1ca0
[   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
[   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
[   60.992174]        [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
[   60.992174]        [<ffffffffa0001e11>] tipc_bclink_rcv+0x611/0x640 [tipc]
[   60.992174]        [<ffffffffa0008646>] tipc_rcv+0x616/0xac0 [tipc]
[   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
[   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
[   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
[   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
[   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
[   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
[   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
[   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
[   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
[   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
[   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
[   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
[   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
[   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
[   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
[   60.992174]
[   60.992174] other info that might help us debug this:
[   60.992174]
[   60.992174]  Possible unsafe locking scenario:
[   60.992174]
[   60.992174]        CPU0                    CPU1
[   60.992174]        ----                    ----
[   60.992174]   lock(&(&bclink->lock)->rlock);
[   60.992174]                                lock(&(&n_ptr->lock)->rlock);
[   60.992174]                                lock(&(&bclink->lock)->rlock);
[   60.992174]   lock(&(&n_ptr->lock)->rlock);
[   60.992174]
[   60.992174]  *** DEADLOCK ***
[   60.992174]
[   60.992174] 3 locks held by swapper/3/0:
[   60.992174]  #0:  (rcu_read_lock){......}, at: [<ffffffff81646791>] __netif_receive_skb_core+0x71/0x980
[   60.992174]  #1:  (rcu_read_lock){......}, at: [<ffffffffa0002c35>] tipc_l2_rcv_msg+0x5/0xd0 [tipc]
[   60.992174]  #2:  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
[   60.992174]

The correct the sequence of grabbing n_ptr->lock and bclink->lock
should be that the former is first held and the latter is then taken,
which exactly happened on CPU1. But especially when the retransmission
of broadcast link is failed, bclink->lock is first held in
tipc_bclink_rcv(), and n_ptr->lock is taken in link_retransmit_failure()
called by tipc_link_retransmit() subsequently, which is demonstrated on
CPU0. As a result, deadlock occurs.

If the order of holding the two locks happening on CPU0 is reversed, the
deadlock risk will be relieved. Therefore, the node lock taken in
link_retransmit_failure() originally is moved to tipc_bclink_rcv()
so that it's obtained before bclink lock. But the precondition of
the adjustment of node lock is that responding to bclink reset event
must be moved from tipc_bclink_unlock() to tipc_node_unlock().
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b952b2be

virtio: simplify the using of received in virtnet_poll · faadb05f

由 Li RongQing 提交于 3月 26, 2015

received is 0, no need to minus it and use "+=" to reassign it
Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

faadb05f