提交 · 7bedd0cfad4e122bc0ddaf3fc955a38c88c95d35 · openeuler / Kernel

01 4月, 2015 10 次提交

mac80211: use rhashtable for station table · 7bedd0cf

由 Johannes Berg 提交于 2月 13, 2015

We currently have a hand-rolled table with 256 entries and are
using the last byte of the MAC address as the hash. This hash
is obviously very fast, but collisions are easily created and
we waste a lot of space in the common case of just connecting
as a client to an AP where we just have a single station. The
other common case of an AP is also suboptimal due to the size
of the hash table and the ease of causing collisions.

Convert all of this to use rhashtable with jhash, which gives
us the advantage of a far better hash function (with random
perturbation to avoid hash collision attacks) and of course
that the hash table grows and shrinks dynamically with chain
length, improving both cases above.

Use a specialised hash function (using jhash, but with fixed
length) to achieve better compiler optimisation as suggested
by Sergey Ryazanov.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

7bedd0cf

net: rename dev to orig_dev in deliver_ptype_list_skb · fbcb2170

由 Jiri Pirko 提交于 3月 30, 2015

Unlike other places, this function uses name "dev" for what should be
"orig_dev", which might be a bit confusing. So fix this.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fbcb2170

netlink: implement nla_get_in_addr and nla_get_in6_addr · 67b61f6c

由 Jiri Benc 提交于 3月 29, 2015

Those are counterparts to nla_put_in_addr and nla_put_in6_addr.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67b61f6c

netlink: implement nla_put_in_addr and nla_put_in6_addr · 930345ea

由 Jiri Benc 提交于 3月 29, 2015

IP addresses are often stored in netlink attributes. Add generic functions
to do that.

For nla_put_in_addr, it would be nicer to pass struct in_addr but this is
not used universally throughout the kernel, in way too many places __be32 is
used to store IPv4 address.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

930345ea

xfrm: simplify xfrm_address_t use · 15e318bd

由 Jiri Benc 提交于 3月 29, 2015

In many places, the a6 field is typecasted to struct in6_addr. As the
fields are in union anyway, just add in6_addr type to the union and
get rid of the typecasting.

Modifying the uapi header is okay, the union has still the same size.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

15e318bd

tcp: simplify inetpeer_addr_base use · 8f55db48

由 Jiri Benc 提交于 3月 29, 2015

In many places, the a6 field is typecasted to struct in6_addr. As the
fields are in union anyway, just add in6_addr type to the union and get rid
of the typecasting.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f55db48

ipv6: coding style: comparison for inequality with NULL · 53b24b8f

由 Ian Morris 提交于 3月 29, 2015

The ipv6 code uses a mixture of coding styles. In some instances check for NULL
pointer is done as x != NULL and sometimes as x. x is preferred according to
checkpatch and this patch makes the code consistent by adopting the latter
form.

No changes detected by objdiff.
Signed-off-by: NIan Morris <ipm@chirality.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53b24b8f

ipv6: coding style: comparison for equality with NULL · 63159f29

由 Ian Morris 提交于 3月 29, 2015

The ipv6 code uses a mixture of coding styles. In some instances check for NULL
pointer is done as x == NULL and sometimes as !x. !x is preferred according to
checkpatch and this patch makes the code consistent by adopting the latter
form.

No changes detected by objdiff.
Signed-off-by: NIan Morris <ipm@chirality.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63159f29

fib_trie: Cleanup ip_fib_net_exit code path · 6e47d6ca

由 Alexander Duyck 提交于 3月 27, 2015

While fixing a recent issue I noticed that we are doing some unnecessary
work inside the loop for ip_fib_net_exit. As such I am pulling out the
initialization to NULL for the locally stored fib_local, fib_main, and
fib_default.

In addition I am restoring the original code for flushing the table as
there is no need to split up the fib_table_flush and hlist_del work since
the code for packing the tnodes with multiple key vectors was dropped.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e47d6ca

fib_trie: Fix warning on fib4_rules_exit · ad88d051

由 Alexander Duyck 提交于 3月 27, 2015

This fixes the following warning:

 BUG: sleeping function called from invalid context at mm/slub.c:1268
 in_atomic(): 1, irqs_disabled(): 0, pid: 6, name: kworker/u8:0
 INFO: lockdep is turned off.
 CPU: 3 PID: 6 Comm: kworker/u8:0 Tainted: G        W       4.0.0-rc5+ #895
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 Workqueue: netns cleanup_net
  0000000000000006 ffff88011953fa68 ffffffff81a203b6 000000002c3a2c39
  ffff88011952a680 ffff88011953fa98 ffffffff8109daf0 ffff8801186c6aa8
  ffffffff81fbc9e5 00000000000004f4 0000000000000000 ffff88011953fac8
 Call Trace:
  [<ffffffff81a203b6>] dump_stack+0x4c/0x65
  [<ffffffff8109daf0>] ___might_sleep+0x1c3/0x1cb
  [<ffffffff8109db70>] __might_sleep+0x78/0x80
  [<ffffffff8117a60e>] slab_pre_alloc_hook+0x31/0x8f
  [<ffffffff8117d4f6>] __kmalloc+0x69/0x14e
  [<ffffffff818ed0e1>] ? kzalloc.constprop.20+0xe/0x10
  [<ffffffff818ed0e1>] kzalloc.constprop.20+0xe/0x10
  [<ffffffff818ef622>] fib_trie_table+0x27/0x8b
  [<ffffffff818ef6bd>] fib_trie_unmerge+0x37/0x2a6
  [<ffffffff810b06e1>] ? arch_local_irq_save+0x9/0xc
  [<ffffffff818e9793>] fib_unmerge+0x2d/0xb3
  [<ffffffff818f5f56>] fib4_rule_delete+0x1f/0x52
  [<ffffffff817f1c3f>] ? fib_rules_unregister+0x30/0xb2
  [<ffffffff817f1c8b>] fib_rules_unregister+0x7c/0xb2
  [<ffffffff818f64a1>] fib4_rules_exit+0x15/0x18
  [<ffffffff818e8c0a>] ip_fib_net_exit+0x23/0xf2
  [<ffffffff818e91f8>] fib_net_exit+0x32/0x36
  [<ffffffff817c8352>] ops_exit_list+0x45/0x57
  [<ffffffff817c8d3d>] cleanup_net+0x13c/0x1cd
  [<ffffffff8108b05d>] process_one_work+0x255/0x4ad
  [<ffffffff8108af69>] ? process_one_work+0x161/0x4ad
  [<ffffffff8108b4b1>] worker_thread+0x1cd/0x2ab
  [<ffffffff8108b2e4>] ? process_scheduled_works+0x2f/0x2f
  [<ffffffff81090686>] kthread+0xd4/0xdc
  [<ffffffff8109ec8f>] ? local_clock+0x19/0x22
  [<ffffffff810905b2>] ? __kthread_parkme+0x83/0x83
  [<ffffffff81a2c0c8>] ret_from_fork+0x58/0x90
  [<ffffffff810905b2>] ? __kthread_parkme+0x83/0x83

The issue was that as a part of exiting the default rules were being
deleted which resulted in the local trie being unmerged.  By moving the
freeing of the FIB tables up we can avoid the unmerge since there is no
local table left when we call the fib4_rules_exit function.

Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse")
Reported-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad88d051

30 3月, 2015 30 次提交

mac80211: set QoS capability before changing station state · 2c44be81

由 Johannes Berg 提交于 3月 30, 2015

In the upcoming fast-xmit patch, changing station state will
build a header cache based on the station's capabilities, and
as the QoS capability (sta.wme) impacts the header, it needs
to be set before.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

2c44be81

mac80211: send HT/VHT IEs in TDLS discovery response · 070e176a

由 Arik Nemtsov 提交于 3月 30, 2015

These are mandated by IEEE802.11-2012 section 8.5.8.6 and IEEE802.11ac-2013
section 8.5.8.16.
Signed-off-by: NArik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

070e176a

mac80211: add VHT support for IBSS · abcff6ef

由 Janusz.Dziedzic@tieto.com 提交于 3月 20, 2015

Add VHT support for IBSS. Drivers could activate
this feature by setting NL80211_EXT_FEATURE_VHT_IBSS
flag.
Signed-off-by: NJanusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

abcff6ef

mac80211: IBSS fix scan request · 76bed0f4

由 Janusz.Dziedzic@tieto.com 提交于 3月 20, 2015

In case of wide bandwidth (wider than 20MHz) used by IBSS,
scan all channels in chandef to be able to find neighboring
IBSS netwqworks that use the same overall channels but a different
control channel.
Signed-off-by: NJanusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

76bed0f4

mac80211: factor out station lookup from ieee80211_build_hdr() · 97ffe757

由 Johannes Berg 提交于 3月 21, 2015

In order to look up the RA station earlier to implement a TX
fastpath, factor out the lookup from ieee80211_build_hdr().
To always have a valid station pointer, also move some of the
checks into the new function.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

97ffe757

mac80211: make sta.wme indicate whether QoS is used · 527871d7

由 Johannes Berg 提交于 3月 21, 2015

Indicating just the peer's capability is fairly pointless
if the local device doesn't support it. Make the variable
track both combined, and remove the 'local support' check
in the TX path.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

527871d7

mac80211: send AP probe as unicast again · a73f8e21

由 Johannes Berg 提交于 3月 21, 2015

Louis reported that a static checker was complaining that
the 'dst' variable was set (multiple times) but not used.
This is due to a previous commit having removed the usage
(apparently erroneously), so add it back.

Fixes: a344d677 ("mac80211: allow drivers to support NL80211_SCAN_FLAG_RANDOM_ADDR")
Reported-by: NLouis Langholtz <lou_langholtz@me.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

a73f8e21

mac80111: aes_gcm: clean up ieee80211_aes_gcm_key_setup_encrypt() · 07862e13

由 Johannes Berg 提交于 3月 23, 2015

This code is written using an anti-pattern called "success handling"
which makes it hard to read, especially if you are used to normal kernel
style.  It should instead be written as a list of directives in a row
with branches for error handling.

(Basically copied from Dan's previous patch for CCM)
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

07862e13

mac80111: aes_ccm: cleanup ieee80211_aes_key_setup_encrypt() · 45fd6329

由 Dan Carpenter 提交于 3月 23, 2015

This code is written using an anti-pattern called "success handling"
which makes it hard to read, especially if you are used to normal kernel
style.  It should instead be written as a list of directives in a row
with branches for error handling.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

45fd6329

mac80211: Fix misplaced return in AES-GMAC key setup · 82ca6ef6

由 Jouni Malinen 提交于 3月 23, 2015

Commit 8ade538b ("mac80111: Add BIP-GMAC-128 and BIP-GMAC-256
ciphers") had the success return in incorrect place before the
crypto_aead_setauthsize() call which practically ended up skipping that
call unconditionally.

The missing call did not actually change any functionality since
GMAC_MIC_LEN (16) is identical to the maxauthsize in gcm(aes) and as
such, the default value used for the authsize parameter.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

82ca6ef6

cfg80211: pass name_assign_type to rdev_add_virtual_intf() · 6bab2e19

由 Tom Gundersen 提交于 3月 18, 2015

This will expose in /sys whether the ifname of a device is set by
userspace or generated by the kernel. The latter kind (wlanX, etc)
is not deterministic, so userspace needs to rename these devices
to names that are guaranteed to stay the same between reboots. The
former, however should never be renamed, so userspace needs to be
able to reliably tell the difference.

Similar functionality was introduced for the rtnetlink core in
commit 5517750f ("net: rtnetlink - make create_link take name_assign_type")
Signed-off-by: NTom Gundersen <teg@jklm.no>
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: Brett Rudley <brudley@broadcom.com>
Cc: Arend van Spriel <arend@broadcom.com>
Cc: Franky (Zhenhui) Lin <frankyl@broadcom.com>
Cc: Hante Meuleman <meuleman@broadcom.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
[reformat changelog to fit 72 cols]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

6bab2e19

mac80211: fix typo in debug output · 6a8b4adb

由 Michael Braun 提交于 3月 18, 2015

Signed-off-by: NMichael Braun <michael-dev@fami-braun.de>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

6a8b4adb

mac80211: reject aggregation sessions with non-HT peers · 8f9c77fc

由 Johannes Berg 提交于 3月 09, 2015

If a peer or some local agent (rate control, ...) decides to start
an aggregation session but doesn't support HT (which also implies
QoS), reject it.

This is mostly a corner case as such peers normally won't try to
use block-ack sessions and rate control wouldn't start them, but
technically QoS stations could request it according to the spec.

However, since drivers don't really support such non-HT sessions
it's better to reject them.

Also, while at it, move the tracing for TX sessions earlier so it
captures the error cases as well.
Reviewed-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

8f9c77fc

cfg/mac80211: add regulatory classes IE during TDLS setup · a38700dd

由 Arik Nemtsov 提交于 3月 18, 2015

Seems Broadcom TDLS peers (Nexus 5, Xperia Z3) refuse to allow TDLS
connection when channel-switching is supported but the regulatory
classes IE is missing from the setup request.
Add a chandef to reg-class translation function to cfg80211 and use it
to add the required IE during setup. For now add only the current
regulatory class as supported - it is enough to resolve the
compatibility issue.
Signed-off-by: NArik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

a38700dd

mac80211: stop scan before connection · 7d830a19

由 David Spinadel 提交于 3月 17, 2015

Stop scan before authentication or association to make sure
that nothing interferes with connection flow.

Currently mac80211 defers RX auth and assoc packets (among other ones)
until after the scan is complete, so auth during scan is likely to fail
if scan took too much time.
Signed-off-by: NDavid Spinadel <david.spinadel@intel.com>
Reviewed-by: NLuciano Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

7d830a19

nl80211: add net-detect delay to wowlan info · 21fea567

由 Luciano Coelho 提交于 3月 17, 2015

Pass the initial net-detect delay (NL80211_ATTR_SCHED_SCAN_DELAY)
attribute in the WoWLAN info response.

Additionally, remove a bogus TODO comment.
Signed-off-by: NLuciano Coelho <luciano.coelho@intel.com>
Reviewed-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

21fea567

mac80211: notify the driver about deauth · a90faa9d

由 Emmanuel Grumbach 提交于 3月 16, 2015

This can allow the driver to take action based on the reason
of the deauth.
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

a90faa9d

mac80211: notify the driver about association status · d0d1a12f

由 Emmanuel Grumbach 提交于 3月 16, 2015

This can allow the driver to take action based on the
success / failure of the association.
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

d0d1a12f

mac80211: notify the driver about authentication status · a9409093

由 Emmanuel Grumbach 提交于 3月 16, 2015

This can allow the driver to take action based on the
success / failure of the authentication.
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

a9409093

mac80211: convert rssi_callback() to event_callback() · a8182929

由 Emmanuel Grumbach 提交于 3月 16, 2015

We will be able to add more events, such as MLME events and
others. The low level driver may be interested in knowing
about these events to dump firmware data upon failures, or
to change parameters in case connection attempts fail etc...
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

a8182929

mac80211: agg-tx: avoid sending DelBA with sta->lock held · 2c158887

由 Johannes Berg 提交于 3月 12, 2015

The rate control locking caused a potential deadlock here due to the
locks being acquired in different orders, so that change cannot yet
be applied. However, there's no fundamental reason for this code to
hold the sta->lock while transmitting frames.

Clearly it's better not to hold the lock for longer periods of time,
which can happen here since we call all the way down to the driver.
Change the code a bit to not hold it while doing that.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

2c158887

tipc: fix two bugs in secondary destination lookup · d482994f

由 Jon Paul Maloy 提交于 3月 27, 2015

A message sent to a node after a successful name table lookup may still
find that the destination socket has disappeared, because distribution
of name table updates is non-atomic. If so, the message will be rejected
back to the sender with error code TIPC_ERR_NO_PORT. If the source
socket of the message has disappeared in the meantime, the message
should be dropped.

However, in the currrent code, the message will instead be subject to an
unwanted tertiary lookup, because the function tipc_msg_lookup_dest()
doesn't check if there is an error code present in the message before
performing the lookup. In the worst case, the message may now find the
old destination again, and be redirected once more, instead of being
dropped directly as it should be.

A second bug in this function is that the "prev_node" field in the message
is not updated after successful lookup, something that may have
unpredictable consequences.

The problems arising from those bugs occur very infrequently.

The third change in this function; the test on msg_reroute_msg_cnt() is
purely cosmetic, reflecting that the returned value never can be negative.

This commit corrects the two bugs described above.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d482994f

net: Introduce passthru_features_check · e38f3025

由 Toshiaki Makita 提交于 3月 27, 2015

As there are a number of (especially virtual) devices that don't
need the multiple vlan check, introduce passthru_features_check() for
convenience.
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e38f3025

net: Move check for multiple vlans to drivers · 8cb65d00

由 Toshiaki Makita 提交于 3月 27, 2015

To allow drivers to handle the features check for multiple tags,
move the check to ndo_features_check().
As no drivers currently handle multiple tagged TSO, introduce
dflt_features_check() and call it if the driver does not have
ndo_features_check().
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8cb65d00

vlan: Introduce helper functions to check if skb is tagged · f5a7fb88

由 Toshiaki Makita 提交于 3月 27, 2015

Separate the two checks for single vlan and multiple vlans in
netif_skb_features().  This allows us to move the check for multiple
vlans to another function later.
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5a7fb88

vlan: Add features for stacked vlan device · 8d463504

由 Toshiaki Makita 提交于 3月 27, 2015

Stacked vlan devices curretly have few features (GRO, HIGHDMA, LLTX).
Since we have software fallbacks in case the NIC can not handle some
features for multiple vlans, we can add the same features as the lower
vlan devices for stacked vlan devices.

This allows stacked vlan devices to create large (GSO) packets and not to
segment packets. Those packets will be segmented by software on the real
device, or even can be segmented by the NIC once TSO for multiple vlans
becomes enabled by the following patches.

The exception is those related to FCoE, which does not have a software
fallback.
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d463504

tc: bpf: generalize pedit action · 608cd71a

由 Alexei Starovoitov 提交于 3月 26, 2015

existing TC action 'pedit' can munge any bits of the packet.
Generalize it for use in bpf programs attached as cls_bpf and act_bpf via
bpf_skb_store_bytes() helper function.
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Reviewed-by: NJiri Pirko <jiri@resnulli.us>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

608cd71a

net: dsa: Add basic framework to support ndo_fdb functions · 339d8262

由 Guenter Roeck 提交于 3月 26, 2015

Provide callbacks for ndo_fdb_add, ndo_fdb_del, and ndo_fdb_dump.
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Tested-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

339d8262

tipc: involve reference counter for node structure · 8a0f6ebe

由 Ying Xue 提交于 3月 26, 2015

TIPC node hash node table is protected with rcu lock on read side.
tipc_node_find() is used to look for a node object with node address
through iterating the hash node table. As the entire process of what
tipc_node_find() traverses the table is guarded with rcu read lock,
it's safe for us. However, when callers use the node object returned
by tipc_node_find(), there is no rcu read lock applied. Therefore,
this is absolutely unsafe for callers of tipc_node_find().

Now we introduce a reference counter for node structure. Before
tipc_node_find() returns node object to its caller, it first increases
the reference counter. Accordingly, after its caller used it up,
it decreases the counter again. This can prevent a node being used by
one thread from being freed by another thread.
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NJon Maloy <jon.maloy@ericson.com>
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a0f6ebe

tipc: fix potential deadlock when all links are reset · b952b2be

由 Ying Xue 提交于 3月 26, 2015

[   60.988363] ======================================================
[   60.988754] [ INFO: possible circular locking dependency detected ]
[   60.989152] 3.19.0+ #194 Not tainted
[   60.989377] -------------------------------------------------------
[   60.989781] swapper/3/0 is trying to acquire lock:
[   60.990079]  (&(&n_ptr->lock)->rlock){+.-...}, at: [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
[   60.990743]
[   60.990743] but task is already holding lock:
[   60.991106]  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
[   60.991738]
[   60.991738] which lock already depends on the new lock.
[   60.991738]
[   60.992174]
[   60.992174] the existing dependency chain (in reverse order) is:
[   60.992174]
-> #1 (&(&bclink->lock)->rlock){+.-...}:
[   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
[   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
[   60.992174]        [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
[   60.992174]        [<ffffffffa0000f57>] tipc_bclink_add_node+0x97/0xf0 [tipc]
[   60.992174]        [<ffffffffa0011815>] tipc_node_link_up+0xf5/0x110 [tipc]
[   60.992174]        [<ffffffffa0007783>] link_state_event+0x2b3/0x4f0 [tipc]
[   60.992174]        [<ffffffffa00193c0>] tipc_link_proto_rcv+0x24c/0x418 [tipc]
[   60.992174]        [<ffffffffa0008857>] tipc_rcv+0x827/0xac0 [tipc]
[   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
[   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
[   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
[   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
[   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
[   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
[   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
[   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
[   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
[   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
[   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
[   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
[   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
[   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
[   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
[   60.992174]
-> #0 (&(&n_ptr->lock)->rlock){+.-...}:
[   60.992174]        [<ffffffff810a8f7d>] __lock_acquire+0x163d/0x1ca0
[   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
[   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
[   60.992174]        [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
[   60.992174]        [<ffffffffa0001e11>] tipc_bclink_rcv+0x611/0x640 [tipc]
[   60.992174]        [<ffffffffa0008646>] tipc_rcv+0x616/0xac0 [tipc]
[   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
[   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
[   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
[   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
[   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
[   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
[   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
[   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
[   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
[   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
[   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
[   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
[   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
[   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
[   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
[   60.992174]
[   60.992174] other info that might help us debug this:
[   60.992174]
[   60.992174]  Possible unsafe locking scenario:
[   60.992174]
[   60.992174]        CPU0                    CPU1
[   60.992174]        ----                    ----
[   60.992174]   lock(&(&bclink->lock)->rlock);
[   60.992174]                                lock(&(&n_ptr->lock)->rlock);
[   60.992174]                                lock(&(&bclink->lock)->rlock);
[   60.992174]   lock(&(&n_ptr->lock)->rlock);
[   60.992174]
[   60.992174]  *** DEADLOCK ***
[   60.992174]
[   60.992174] 3 locks held by swapper/3/0:
[   60.992174]  #0:  (rcu_read_lock){......}, at: [<ffffffff81646791>] __netif_receive_skb_core+0x71/0x980
[   60.992174]  #1:  (rcu_read_lock){......}, at: [<ffffffffa0002c35>] tipc_l2_rcv_msg+0x5/0xd0 [tipc]
[   60.992174]  #2:  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
[   60.992174]

The correct the sequence of grabbing n_ptr->lock and bclink->lock
should be that the former is first held and the latter is then taken,
which exactly happened on CPU1. But especially when the retransmission
of broadcast link is failed, bclink->lock is first held in
tipc_bclink_rcv(), and n_ptr->lock is taken in link_retransmit_failure()
called by tipc_link_retransmit() subsequently, which is demonstrated on
CPU0. As a result, deadlock occurs.

If the order of holding the two locks happening on CPU0 is reversed, the
deadlock risk will be relieved. Therefore, the node lock taken in
link_retransmit_failure() originally is moved to tipc_bclink_rcv()
so that it's obtained before bclink lock. But the precondition of
the adjustment of node lock is that responding to bclink reset event
must be moved from tipc_bclink_unlock() to tipc_node_unlock().
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b952b2be

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功