提交 · 6bd64ac0f9c264082241da0db0dcc72a13e672a8 · OpenHarmony / kernel_linux

17 5月, 2014 5 次提交

macvlan: Fix lockdep warnings with stacked macvlan devices · c674ac30

由 Vlad Yasevich 提交于 5月 16, 2014

Macvlan devices try to avoid stacking, but that's not always
successfull or even desired.  As an example, the following
configuration is perefectly legal and valid:

eth0 <--- macvlan0 <---- vlan0.10 <--- macvlan1

However, this configuration produces the following lockdep
trace:
[  115.620418] ======================================================
[  115.620477] [ INFO: possible circular locking dependency detected ]
[  115.620516] 3.15.0-rc1+ #24 Not tainted
[  115.620540] -------------------------------------------------------
[  115.620577] ip/1704 is trying to acquire lock:
[  115.620604]  (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80
[  115.620686]
but task is already holding lock:
[  115.620723]  (&macvlan_netdev_addr_lock_key){+.....}, at: [<ffffffff815da5be>] dev_set_rx_mode+0x1e/0x40
[  115.620795]
which lock already depends on the new lock.

[  115.620853]
the existing dependency chain (in reverse order) is:
[  115.620894]
-> #1 (&macvlan_netdev_addr_lock_key){+.....}:
[  115.620935]        [<ffffffff810d57f2>] lock_acquire+0xa2/0x130
[  115.620974]        [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50
[  115.621019]        [<ffffffffa07296c3>] vlan_dev_set_rx_mode+0x53/0x110 [8021q]
[  115.621066]        [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0
[  115.621105]        [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40
[  115.621143]        [<ffffffff815da6be>] __dev_open+0xde/0x140
[  115.621174]        [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170
[  115.621174]        [<ffffffff815daaa9>] dev_change_flags+0x29/0x60
[  115.621174]        [<ffffffff815e7f11>] do_setlink+0x321/0x9a0
[  115.621174]        [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730
[  115.621174]        [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250
[  115.621174]        [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0
[  115.621174]        [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40
[  115.621174]        [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0
[  115.621174]        [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740
[  115.621174]        [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0
[  115.621174]        [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380
[  115.621174]        [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80
[  115.621174]        [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20
[  115.621174]        [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b
[  115.621174]
-> #0 (&vlan_netdev_addr_lock_key/1){+.....}:
[  115.621174]        [<ffffffff810d4d43>] __lock_acquire+0x1773/0x1a60
[  115.621174]        [<ffffffff810d57f2>] lock_acquire+0xa2/0x130
[  115.621174]        [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50
[  115.621174]        [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80
[  115.621174]        [<ffffffffa0696d2a>] macvlan_set_mac_lists+0xca/0x110 [macvlan]
[  115.621174]        [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0
[  115.621174]        [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40
[  115.621174]        [<ffffffff815da6be>] __dev_open+0xde/0x140
[  115.621174]        [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170
[  115.621174]        [<ffffffff815daaa9>] dev_change_flags+0x29/0x60
[  115.621174]        [<ffffffff815e7f11>] do_setlink+0x321/0x9a0
[  115.621174]        [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730
[  115.621174]        [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250
[  115.621174]        [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0
[  115.621174]        [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40
[  115.621174]        [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0
[  115.621174]        [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740
[  115.621174]        [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0
[  115.621174]        [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380
[  115.621174]        [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80
[  115.621174]        [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20
[  115.621174]        [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b
[  115.621174]
other info that might help us debug this:

[  115.621174]  Possible unsafe locking scenario:

[  115.621174]        CPU0                    CPU1
[  115.621174]        ----                    ----
[  115.621174]   lock(&macvlan_netdev_addr_lock_key);
[  115.621174]                                lock(&vlan_netdev_addr_lock_key/1);
[  115.621174]                                lock(&macvlan_netdev_addr_lock_key);
[  115.621174]   lock(&vlan_netdev_addr_lock_key/1);
[  115.621174]
 *** DEADLOCK ***

[  115.621174] 2 locks held by ip/1704:
[  115.621174]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff815e6dbb>] rtnetlink_rcv+0x1b/0x40
[  115.621174]  #1:  (&macvlan_netdev_addr_lock_key){+.....}, at: [<ffffffff815da5be>] dev_set_rx_mode+0x1e/0x40
[  115.621174]
stack backtrace:
[  115.621174] CPU: 3 PID: 1704 Comm: ip Not tainted 3.15.0-rc1+ #24
[  115.621174] Hardware name: Hewlett-Packard HP xw8400 Workstation/0A08h, BIOS 786D5 v02.38 10/25/2010
[  115.621174]  ffffffff82339ae0 ffff880465f79568 ffffffff816ee20c ffffffff82339ae0
[  115.621174]  ffff880465f795a8 ffffffff816e9e1b ffff880465f79600 ffff880465b019c8
[  115.621174]  0000000000000001 0000000000000002 ffff880465b019c8 ffff880465b01230
[  115.621174] Call Trace:
[  115.621174]  [<ffffffff816ee20c>] dump_stack+0x4d/0x66
[  115.621174]  [<ffffffff816e9e1b>] print_circular_bug+0x200/0x20e
[  115.621174]  [<ffffffff810d4d43>] __lock_acquire+0x1773/0x1a60
[  115.621174]  [<ffffffff810d3172>] ? trace_hardirqs_on_caller+0xb2/0x1d0
[  115.621174]  [<ffffffff810d57f2>] lock_acquire+0xa2/0x130
[  115.621174]  [<ffffffff815df49c>] ? dev_uc_sync+0x3c/0x80
[  115.621174]  [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50
[  115.621174]  [<ffffffff815df49c>] ? dev_uc_sync+0x3c/0x80
[  115.621174]  [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80
[  115.621174]  [<ffffffffa0696d2a>] macvlan_set_mac_lists+0xca/0x110 [macvlan]
[  115.621174]  [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0
[  115.621174]  [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40
[  115.621174]  [<ffffffff815da6be>] __dev_open+0xde/0x140
[  115.621174]  [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170
[  115.621174]  [<ffffffff815daaa9>] dev_change_flags+0x29/0x60
[  115.621174]  [<ffffffff811e1db1>] ? mem_cgroup_bad_page_check+0x21/0x30
[  115.621174]  [<ffffffff815e7f11>] do_setlink+0x321/0x9a0
[  115.621174]  [<ffffffff810d394c>] ? __lock_acquire+0x37c/0x1a60
[  115.621174]  [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730
[  115.621174]  [<ffffffff815ea169>] ? rtnl_newlink+0xe9/0x730
[  115.621174]  [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250
[  115.621174]  [<ffffffff810d329d>] ? trace_hardirqs_on+0xd/0x10
[  115.621174]  [<ffffffff815e6dbb>] ? rtnetlink_rcv+0x1b/0x40
[  115.621174]  [<ffffffff815e6de0>] ? rtnetlink_rcv+0x40/0x40
[  115.621174]  [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0
[  115.621174]  [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40
[  115.621174]  [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0
[  115.621174]  [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740
[  115.621174]  [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0
[  115.621174]  [<ffffffff8119d4af>] ? might_fault+0x5f/0xb0
[  115.621174]  [<ffffffff8119d4f8>] ? might_fault+0xa8/0xb0
[  115.621174]  [<ffffffff8119d4af>] ? might_fault+0x5f/0xb0
[  115.621174]  [<ffffffff815cb51e>] ? verify_iovec+0x5e/0xe0
[  115.621174]  [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380
[  115.621174]  [<ffffffff816faa0d>] ? __do_page_fault+0x11d/0x570
[  115.621174]  [<ffffffff810cfe9f>] ? up_read+0x1f/0x40
[  115.621174]  [<ffffffff816fab04>] ? __do_page_fault+0x214/0x570
[  115.621174]  [<ffffffff8120a10b>] ? mntput_no_expire+0x6b/0x1c0
[  115.621174]  [<ffffffff8120a0b7>] ? mntput_no_expire+0x17/0x1c0
[  115.621174]  [<ffffffff8120a284>] ? mntput+0x24/0x40
[  115.621174]  [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80
[  115.621174]  [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20
[  115.621174]  [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b

Fix this by correctly providing macvlan lockdep class.
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c674ac30

vlan: Fix lockdep warning with stacked vlan devices. · d38569ab

由 Vlad Yasevich 提交于 5月 16, 2014

This reverts commit dc8eaaa0.
	vlan: Fix lockdep warning when vlan dev handle notification

Instead we use the new new API to find the lock subclass of
our vlan device.  This way we can support configurations where
vlans are interspersed with other devices:
  bond -> vlan -> macvlan -> vlan
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d38569ab

net: Allow for more then a single subclass for netif_addr_lock · 25175ba5

由 Vlad Yasevich 提交于 5月 16, 2014

Currently netif_addr_lock_nested assumes that there can be only
a single nesting level between 2 devices.  However, if we
have multiple devices of the same type stacked, this fails.
For example:
 eth0 <-- vlan0.10 <-- vlan0.10.20

A more complicated configuration may stack more then one type of
device in different order.
Ex:
  eth0 <-- vlan0.10 <-- macvlan0 <-- vlan1.10.20 <-- macvlan1

This patch adds an ndo_* function that allows each stackable
device to report its nesting level.  If the device doesn't
provide this function default subclass of 1 is used.
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25175ba5

net: Find the nesting level of a given device by type. · 4085ebe8

由 Vlad Yasevich 提交于 5月 16, 2014

Multiple devices in the kernel can be stacked/nested and they
need to know their nesting level for the purposes of lockdep.
This patch provides a generic function that determines a nesting
level of a particular device by its type (ex: vlan, macvlan, etc).
We only care about nesting of the same type of devices.

For example:
  eth0 <- vlan0.10 <- macvlan0 <- vlan1.20

The nesting level of vlan1.20 would be 1, since there is another vlan
in the stack under it.
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4085ebe8

net/mlx4_core: Add UPDATE_QP SRIOV wrapper support · ce8d9e0d

由 Matan Barak 提交于 5月 15, 2014

This patch adds UPDATE_QP SRIOV wrapper support.

The mechanism is a general one, but currently only source MAC
index changes are allowed for VFs.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce8d9e0d

16 5月, 2014 2 次提交

ipv6: update Destination Cache entries when gateway turn into host · be7a010d

由 Duan Jiong 提交于 5月 15, 2014

RFC 4861 states in 7.2.5:

	The IsRouter flag in the cache entry MUST be set based on the
         Router flag in the received advertisement.  In those cases
         where the IsRouter flag changes from TRUE to FALSE as a result
         of this update, the node MUST remove that router from the
         Default Router List and update the Destination Cache entries
         for all destinations using that neighbor as a router as
         specified in Section 7.3.3.  This is needed to detect when a
         node that is used as a router stops forwarding packets due to
         being configured as a host.

Currently, when dealing with NA Message which IsRouter flag changes from
TRUE to FALSE, the kernel only removes router from the Default Router List,
and don't update the Destination Cache entries.

Now in order to update those Destination Cache entries, i introduce
function rt6_clean_tohost().
Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be7a010d

rtnetlink: wait for unregistering devices in rtnl_link_unregister() · 200b916f

由 Cong Wang 提交于 5月 12, 2014

From: Cong Wang <cwang@twopensource.com>

commit 50624c93 (net: Delay default_device_exit_batch until no
devices are unregistering) introduced rtnl_lock_unregistering() for
default_device_exit_batch(). Same race could happen we when rmmod a driver
which calls rtnl_link_unregister() as we call dev->destructor without rtnl
lock.

For long term, I think we should clean up the mess of netdev_run_todo()
and net namespce exit code.

Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NCong Wang <cwang@twopensource.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

200b916f

14 5月, 2014 1 次提交

net: avoid dependency of net_get_random_once on nop patching · 3d440522

由 Hannes Frederic Sowa 提交于 5月 11, 2014

net_get_random_once depends on the static keys infrastructure to patch up
the branch to the slow path during boot. This was realized by abusing the
static keys api and defining a new initializer to not enable the call
site while still indicating that the branch point should get patched
up. This was needed to have the fast path considered likely by gcc.

The static key initialization during boot up normally walks through all
the registered keys and either patches in ideal nops or enables the jump
site but omitted that step on x86 if ideal nops where already placed at
static_key branch points. Thus net_get_random_once branches not always
became active.

This patch switches net_get_random_once to the ordinary static_key
api and thus places the kernel fast path in the - by gcc considered -
unlikely path. Microbenchmarks on Intel and AMD x86-64 showed that
the unlikely path actually beats the likely path in terms of cycle cost
and that different nop patterns did not make much difference, thus this
switch should not be noticeable.

Fixes: a48e4292 ("net: introduce new macro net_get_random_once")
Reported-by: NTuomas Räsänen <tuomasjjrasanen@tjjr.fi>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d440522

13 5月, 2014 1 次提交

nl80211: fix NL80211_FEATURE_P2P_DEVICE_NEEDS_CHANNEL API · f5651986

由 Johannes Berg 提交于 5月 13, 2014

My commit removing that also removed it from the header file
which can break compilation of userspace that needed it, add
it back for API/ABI compatibility purposes (but no code to
implement anything for it.)
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

f5651986

09 5月, 2014 2 次提交

ping: move ping_group_range out of CONFIG_SYSCTL · ba6b918a

由 Cong Wang 提交于 5月 06, 2014

Similarly, when CONFIG_SYSCTL is not set, ping_group_range should still
work, just that no one can change it. Therefore we should move it out of
sysctl_net_ipv4.c. And, it should not share the same seqlock with
ip_local_port_range.

BTW, rename it to ->ping_group_range instead.

Cc: David S. Miller <davem@davemloft.net>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Reported-by: NStefan de Konink <stefan@konink.de>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba6b918a

ipv4: move local_port_range out of CONFIG_SYSCTL · c9d8f1a6

由 Cong Wang 提交于 5月 06, 2014

When CONFIG_SYSCTL is not set, ip_local_port_range should still work,
just that no one can change it. Therefore we should move it out of sysctl_inet.c.
Also, rename it to ->ip_local_ports instead.

Cc: David S. Miller <davem@davemloft.net>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Reported-by: NStefan de Konink <stefan@konink.de>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9d8f1a6

08 5月, 2014 2 次提交

net: mdio: of_mdiobus_register(): fall back to mdiobus_register() for !CONFIG_OF · 23a456f0

由 Daniel Mack 提交于 5月 06, 2014

If CONFIG_OF is not set, make of_mdiobus_register() call
mdiobus_register() instead of returning -ENOSYS.

This way, we can just call of_mdiobus_register() from all DT-enabled
drivers to handle the compat cases.
Signed-off-by: NDaniel Mack <zonque@gmail.com>
Suggested-by: NFlorian Fainelli <f.fainelli@gmail.com>
Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
Acked-by: NMugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

23a456f0

Revert "net: core: introduce netif_skb_dev_features" · c1e756bf

由 Florian Westphal 提交于 5月 05, 2014

This reverts commit d2069403,
there are no more callers.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c1e756bf

06 5月, 2014 1 次提交

vsock: Make transport the proto owner · 2c4a336e

由 Andy King 提交于 5月 01, 2014

Right now the core vsock module is the owner of the proto family. This
means there's nothing preventing the transport module from unloading if
there are open sockets, which results in a panic. Fix that by allowing
the transport to be the owner, which will refcount it properly.

Includes version bump to 1.0.1.0-k

Passes checkpatch this time, I swear...
Acked-by: NDmitry Torokhov <dtor@vmware.com>
Signed-off-by: NAndy King <acking@vmware.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c4a336e

05 5月, 2014 1 次提交

cfg80211: add cfg80211_sched_scan_stopped_rtnl · 792e6aa7

由 Eliad Peller 提交于 4月 30, 2014

Add locked-version for cfg80211_sched_scan_stopped.
This is used for some users that might want to
call it when rtnl is already locked.

Fixes: d43c6b6e ("mac80211: reschedule sched scan after HW restart")
Cc: stable@vger.kernel.org (3.14+)
Signed-off-by: NEliad Peller <eliadx.peller@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

792e6aa7

04 5月, 2014 2 次提交

Revert "tty: Fix race condition between __tty_buffer_request_room and flush_to_ldisc" · 5fbf1a65

由 Peter Hurley 提交于 5月 02, 2014

This reverts commit 6a20dbd6.

Although the commit correctly identifies an unsafe race condition
between __tty_buffer_request_room() and flush_to_ldisc(), the commit
fixes the race with an unnecessary spinlock in a lockless algorithm.

The follow-on commit, "tty: Fix lockless tty buffer race" fixes
the race locklessly.
Signed-off-by: NPeter Hurley <peter@hurleysoftware.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

5fbf1a65

arm64: fixmap: fix missing sub-page offset for earlyprintk · f774b7d1

由 Marc Zyngier 提交于 4月 28, 2014

Commit d57c33c5 (add generic fixmap.h) added (among other
similar things) set_fixmap_io to deal with early ioremap of devices.

More recently, commit bf4b558e (arm64: add early_ioremap support)
converted the arm64 earlyprintk to use set_fixmap_io. A side effect of
this conversion is that my virtual machines have stopped booting when
I pass "earlyprintk=uart8250-8bit,0x3f8" to the guest kernel.

Turns out that the new earlyprintk code doesn't care at all about
sub-page offsets, and just assumes that the earlyprintk device will
be page-aligned. Obviously, that doesn't play well with the above example.

Further investigation shows that set_fixmap_io uses __set_fixmap instead
of __set_fixmap_offset. A fix is to introduce a set_fixmap_offset_io that
uses the latter, and to remove the superflous call to fix_to_virt
(which only returns the value that set_fixmap_io has already given us).

With this applied, my VMs are back in business. Tested on a Cortex-A57
platform with kvmtool as platform emulation.

Cc: Will Deacon <will.deacon@arm.com>
Acked-by: NMark Salter <msalter@redhat.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

f774b7d1

01 5月, 2014 1 次提交

word-at-a-time: simplify big-endian zero_bytemask macro · 789ce9dc

由 H. Peter Anvin 提交于 4月 30, 2014

This is simpler and cleaner.  Depending on architecture, a smart
compiler may or may not generate the same code.
Acked-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

789ce9dc

28 4月, 2014 4 次提交

ftrace/module: Hardcode ftrace_module_init() call into load_module() · a949ae56

由 Steven Rostedt (Red Hat) 提交于 4月 24, 2014

A race exists between module loading and enabling of function tracer.

	CPU 1				CPU 2
	-----				-----
  load_module()
   module->state = MODULE_STATE_COMING

				register_ftrace_function()
				 mutex_lock(&ftrace_lock);
				 ftrace_startup()
				  update_ftrace_function();
				   ftrace_arch_code_modify_prepare()
				    set_all_module_text_rw();
				   <enables-ftrace>
				    ftrace_arch_code_modify_post_process()
				     set_all_module_text_ro();

				[ here all module text is set to RO,
				  including the module that is
				  loading!! ]

   blocking_notifier_call_chain(MODULE_STATE_COMING);
    ftrace_init_module()

     [ tries to modify code, but it's RO, and fails!
       ftrace_bug() is called]

When this race happens, ftrace_bug() will produces a nasty warning and
all of the function tracing features will be disabled until reboot.

The simple solution is to treate module load the same way the core
kernel is treated at boot. To hardcode the ftrace function modification
of converting calls to mcount into nops. This is done in init/main.c
there's no reason it could not be done in load_module(). This gives
a better control of the changes and doesn't tie the state of the
module to its notifiers as much. Ftrace is special, it needs to be
treated as such.

The reason this would work, is that the ftrace_module_init() would be
called while the module is in MODULE_STATE_UNFORMED, which is ignored
by the set_all_module_text_ro() call.

Link: http://lkml.kernel.org/r/1395637826-3312-1-git-send-email-indou.takao@jp.fujitsu.comReported-by: NTakao Indoh <indou.takao@jp.fujitsu.com>
Acked-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: stable@vger.kernel.org # 2.6.38+
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a949ae56

genirq: x86: Ensure that dynamic irq allocation does not conflict · 62a08ae2

由 Thomas Gleixner 提交于 4月 24, 2014

On x86 the allocation of irq descriptors may allocate interrupts which
are in the range of the GSI interrupts. That's wrong as those
interrupts are hardwired and we don't have the irq domain translation
like PPC. So one of these interrupts can be hooked up later to one of
the devices which are hard wired to it and the io_apic init code for
that particular interrupt line happily reuses that descriptor with a
completely different configuration so hell breaks lose.

Inside x86 we allocate dynamic interrupts from above nr_gsi_irqs,
except for a few usage sites which have not yet blown up in our face
for whatever reason. But for drivers which need an irq range, like the
GPIO drivers, we have no limit in place and we don't want to expose
such a detail to a driver.

To cure this introduce a function which an architecture can implement
to impose a lower bound on the dynamic interrupt allocations.

Implement it for x86 and set the lower bound to nr_gsi_irqs, which is
the end of the hardwired interrupt space, so all dynamic allocations
happen above.

That not only allows the GPIO driver to work sanely, it also protects
the bogus callsites of create_irq_nr() in hpet, uv, irq_remapping and
htirq code. They need to be cleaned up as well, but that's a separate
issue.
Reported-by: NJin Yao <yao.jin@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>
Cc: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Krogerus Heikki <heikki.krogerus@intel.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1404241617360.28206@ionos.tec.linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

62a08ae2

linux/interrupt.h: fix new kernel-doc warnings · def5f127

由 Randy Dunlap 提交于 4月 27, 2014

Fix new kernel-doc warnings in <linux/interrupt.h>:

Warning(include/linux/interrupt.h:219): No description found for parameter 'cpumask'
Warning(include/linux/interrupt.h:219): Excess function parameter 'mask' description in 'irq_set_affinity'
Warning(include/linux/interrupt.h:236): No description found for parameter 'cpumask'
Warning(include/linux/interrupt.h:236): Excess function parameter 'mask' description in 'irq_force_affinity'
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Link: http://lkml.kernel.org/r/535DD2FD.7030804@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

def5f127

word-at-a-time: avoid undefined behaviour in zero_bytemask macro · ec6931b2

由 Will Deacon 提交于 4月 23, 2014

The asm-generic, big-endian version of zero_bytemask creates a mask of
bytes preceding the first zero-byte by left shifting ~0ul based on the
position of the first zero byte.

Unfortunately, if the first (top) byte is zero, the output of
prep_zero_mask has only the top bit set, resulting in undefined C
behaviour as we shift left by an amount equal to the width of the type.
As it happens, GCC doesn't manage to spot this through the call to fls(),
but the issue remains if architectures choose to implement their shift
instructions differently.

An example would be arch/arm/ (AArch32), where LSL Rd, Rn, #32 results
in Rd == 0x0, whilst on arch/arm64 (AArch64) LSL Xd, Xn, #64 results in
Xd == Xn.

Rather than check explicitly for the problematic shift, this patch adds
an extra shift by 1, replacing fls with __fls. Since zero_bytemask is
never called with a zero argument (has_zero() is used to check the data
first), we don't need to worry about calling __fls(0), which is
undefined.

Cc: <stable@vger.kernel.org>
Cc: Victor Kamensky <victor.kamensky@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ec6931b2

25 4月, 2014 6 次提交

tty: Fix race condition between __tty_buffer_request_room and flush_to_ldisc · 6a20dbd6

由 Manfred Schlaegl 提交于 4月 08, 2014

The race was introduced while development of linux-3.11 by
e8437d7e and
e9975fde.
Originally it was found and reproduced on linux-3.12.15 and
linux-3.12.15-rt25, by sending 500 byte blocks with 115kbaud to the
target uart in a loop with 100 milliseconds delay.

In short:
 1. The consumer flush_to_ldisc is on to remove the head tty_buffer.
 2. The producer adds a number of bytes, so that a new tty_buffer must
	be allocated and added by __tty_buffer_request_room.
 3. The consumer removes the head tty_buffer element, without handling
	newly committed data.

Detailed example:
 * Initial buffer:
   * Head, Tail -> 0: used=250; commit=250; read=240; next=NULL
 * Consumer: ''flush_to_ldisc''
   * consumed 10 Byte
   * buffer:
     * Head, Tail -> 0: used=250; commit=250; read=250; next=NULL
{{{
		count = head->commit - head->read;	// count = 0
		if (!count) {				// enter
			// INTERRUPTED BY PRODUCER ->
			if (head->next == NULL)
				break;
			buf->head = head->next;
			tty_buffer_free(port, head);
			continue;
		}
}}}
 * Producer: tty_insert_flip_... 10 bytes + tty_flip_buffer_push
   * buffer:
     * Head, Tail -> 0: used=250; commit=250; read=250; next=NULL
   * added 6 bytes: head-element filled to maximum.
     * buffer:
       * Head, Tail -> 0: used=256; commit=250; read=250; next=NULL
   * added 4 bytes: __tty_buffer_request_room is called
     * buffer:
       * Head -> 0: used=256; commit=256; read=250; next=1
       * Tail -> 1: used=4; commit=0; read=250 next=NULL
   * push (tty_flip_buffer_push)
     * buffer:
       * Head -> 0: used=256; commit=256; read=250; next=1
       * Tail -> 1: used=4; commit=4; read=250 next=NULL
 * Consumer
{{{
		count = head->commit - head->read;
		if (!count) {
			// INTERRUPTED BY PRODUCER <-
			if (head->next == NULL)		// -> no break
				break;
			buf->head = head->next;
			tty_buffer_free(port, head);
			// ERROR: tty_buffer head freed -> 6 bytes lost
			continue;
		}
}}}

This patch reintroduces a spin_lock to protect this case. Perhaps later
a lock-less solution could be found.
Signed-off-by: NManfred Schlaegl <manfred.schlaegl@gmx.at>
Cc: stable <stable@vger.kernel.org> # 3.11
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

6a20dbd6

of/irq: do irq resolution in platform_get_irq · 9ec36caf

由 Rob Herring 提交于 4月 23, 2014

Currently we get the following kind of errors if we try to use interrupt
phandles to irqchips that have not yet initialized:

irq: no irq domain found for /ocp/pinmux@48002030 !
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at drivers/of/platform.c:171 of_device_alloc+0x144/0x184()
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.12.0-00038-g42a9708 #1012
(show_stack+0x14/0x1c)
(dump_stack+0x6c/0xa0)
(warn_slowpath_common+0x64/0x84)
(warn_slowpath_null+0x1c/0x24)
(of_device_alloc+0x144/0x184)
(of_platform_device_create_pdata+0x44/0x9c)
(of_platform_bus_create+0xd0/0x170)
(of_platform_bus_create+0x12c/0x170)
(of_platform_populate+0x60/0x98)

This is because we're wrongly trying to populate resources that are not
yet available. It's perfectly valid to create irqchips dynamically, so
let's fix up the issue by resolving the interrupt resources when
platform_get_irq is called.

And then we also need to accept the fact that some irqdomains do not
exist that early on, and only get initialized later on. So we can
make the current WARN_ON into just into a pr_debug().

We still attempt to populate irq resources when we create the devices.
This allows current drivers which don't use platform_get_irq to continue
to function. Once all drivers are fixed, this code can be removed.
Suggested-by: NRussell King <linux@arm.linux.org.uk>
Signed-off-by: NRob Herring <robh@kernel.org>
Signed-off-by: NTony Lindgren <tony@atomide.com>
Tested-by: NTony Lindgren <tony@atomide.com>
Cc: stable@vger.kernel.org # v3.10+
Signed-off-by: NGrant Likely <grant.likely@linaro.org>

9ec36caf

phy: core: make NULL a valid phy reference if !CONFIG_GENERIC_PHY · 2b97789f

由 Grygorii Strashko 提交于 4月 19, 2014

This fixes a regression on Keystone 2 platforms caused by patch
57303488
"usb: dwc3: adapt dwc3 core to use Generic PHY Framework" which adds
optional support of generic phy in DWC3 core.

On Keystone 2 platforms the USB is not working now because
CONFIG_GENERIC_PHY isn't set and, as result, Generic PHY APIs stubs
return -ENOSYS always. The log shows:
 dwc3 2690000.dwc3: failed to initialize core
 dwc3: probe of 2690000.dwc3 failed with error -38

Hence, fix it by making NULL a valid phy reference in Generic PHY
APIs stubs in the same way as it was done by the patch
04c2faca "drivers: phy: Make NULL
a valid phy reference".
Acked-by: NFelipe Balbi <balbi@ti.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: NKishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

2b97789f

net: Add variants of capable for use on netlink messages · aa4cf945

由 Eric W. Biederman 提交于 4月 23, 2014

netlink_net_capable - The common case use, for operations that are safe on a network namespace
netlink_capable - For operations that are only known to be safe for the global root
netlink_ns_capable - The general case of capable used to handle special cases

__netlink_ns_capable - Same as netlink_ns_capable except taking a netlink_skb_parms instead of
		       the skbuff of a netlink message.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa4cf945

net: Add variants of capable for use on on sockets · a3b299da

由 Eric W. Biederman 提交于 4月 23, 2014

sk_net_capable - The common case, operations that are safe in a network namespace.
sk_capable - Operations that are not known to be safe in a network namespace
sk_ns_capable - The general case for special cases.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3b299da

net: Move the permission check in sock_diag_put_filterinfo to packet_diag_dump · a53b72c8

由 Eric W. Biederman 提交于 4月 23, 2014

The permission check in sock_diag_put_filterinfo is wrong, and it is so removed
from it's sources it is not clear why it is wrong. Move the computation
into packet_diag_dump and pass a bool of the result into sock_diag_filterinfo.

This does not yet correct the capability check but instead simply moves it to make
it clear what is going on.
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a53b72c8

24 4月, 2014 2 次提交

dt: tegra: remove non-existent clock IDs · 9ef1af9e

由 Stephen Warren 提交于 4月 01, 2014

The Tegra124 clock DT binding currently provides 3 clocks that don't
actually exist; 2 for NAND and one for UART5/UARTE. Delete these. While
this is technically an incompatible DT ABI change, nothing could have
used these clock IDs for anything practical, since the HW doesn't exist.

Cc: <stable@vger.kernel.org>
Signed-off-by: NStephen Warren <swarren@nvidia.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

9ef1af9e

locks: rename FL_FILE_PVT and IS_FILE_PVT to use "*_OFDLCK" instead · cff2fce5

由 Jeff Layton 提交于 4月 22, 2014

File-private locks have been re-christened as "open file description"
locks. Finish the symbol name cleanup in the internal implementation.
Signed-off-by: NJeff Layton <jlayton@redhat.com>

cff2fce5

23 4月, 2014 2 次提交

Fix: tracing: use 'E' instead of 'X' for unsigned module tain flag · 132e9a68

由 Mathieu Desnoyers 提交于 4月 15, 2014

In the following commit:

commit 57673c2b
Author: Rusty Russell <rusty@rustcorp.com.au>
Date:   Mon Mar 31 14:39:57 2014 +1030

    Use 'E' instead of 'X' for unsigned module taint flag.

One site has been forgotten in trace events module.h.
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

132e9a68

net: Fix ns_capable check in sock_diag_put_filterinfo · 78541c1d

由 Andrew Lutomirski 提交于 4月 16, 2014

The caller needs capabilities on the namespace being queried, not on
their own namespace.  This is a security bug, although it likely has
only a minor impact.

Cc: stable@vger.kernel.org
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

78541c1d

22 4月, 2014 1 次提交

locks: rename file-private locks to "open file description locks" · 0d3f7a2d

由 Jeff Layton 提交于 4月 22, 2014

File-private locks have been merged into Linux for v3.15, and *now*
people are commenting that the name and macro definitions for the new
file-private locks suck.

...and I can't even disagree. The names and command macros do suck.

We're going to have to live with these for a long time, so it's
important that we be happy with the names before we're stuck with them.
The consensus on the lists so far is that they should be rechristened as
"open file description locks".

The name isn't a big deal for the kernel, but the command macros are not
visually distinct enough from the traditional POSIX lock macros. The
glibc and documentation folks are recommending that we change them to
look like F_OFD_{GETLK|SETLK|SETLKW}. That lessens the chance that a
programmer will typo one of the commands wrong, and also makes it easier
to spot this difference when reading code.

This patch makes the following changes that I think are necessary before
v3.15 ships:

1) rename the command macros to their new names. These end up in the uapi
   headers and so are part of the external-facing API. It turns out that
   glibc doesn't actually use the fcntl.h uapi header, but it's hard to
   be sure that something else won't. Changing it now is safest.

2) make the the /proc/locks output display these as type "OFDLCK"

Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Stefan Metzmacher <metze@samba.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Frank Filz <ffilzlnx@mindspring.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: NJeff Layton <jlayton@redhat.com>

0d3f7a2d

20 4月, 2014 2 次提交

Input: Add INPUT_PROP_TOPBUTTONPAD device property · f37c0134

由 Hans de Goede 提交于 4月 19, 2014

On some newer laptops with a trackpoint the physical buttons for the
trackpoint have been removed to allow for a larger touchpad. On these
laptops the buttonpad has clearly marked areas on the top which are to be
used as trackpad buttons.

Users of the event device-node need to know about this, so that they can
properly interpret BTN_LEFT events as being a left / right / middle click
depending on where on the button pad the clicking finger is.

This commits adds a INPUT_PROP_TOPBUTTONPAD device property which drivers
for such buttonpads will use to signal to the user that this buttonpad not
only has the normal bottom button area, but also a top button area.
Signed-off-by: NHans de Goede <hdegoede@redhat.com>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

f37c0134

Input: serio - add firmware_id sysfs attribute · 0456c66f

由 Hans de Goede 提交于 4月 19, 2014

serio devices exposed via platform firmware interfaces such as ACPI may
provide additional identifying information of use to userspace.

We don't associate the serio devices with the firmware device (we don't
set it as parent), so there's no way for userspace to make use of this
information.

We cannot change the parent for serio devices instantiated though a
firmware interface as that would break suspend / resume ordering.

Therefore this patch adds a new firmware_id sysfs attribute so that
userspace can get a string from there with any additional identifying
information the firmware interface may provide.
Signed-off-by: NHans de Goede <hdegoede@redhat.com>
Acked-by: NPeter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

0456c66f

19 4月, 2014 5 次提交

mm: use paravirt friendly ops for NUMA hinting ptes · 29c77870

由 Mel Gorman 提交于 4月 18, 2014

David Vrabel identified a regression when using automatic NUMA balancing
under Xen whereby page table entries were getting corrupted due to the
use of native PTE operations.  Quoting him

	Xen PV guest page tables require that their entries use machine
	addresses if the preset bit (_PAGE_PRESENT) is set, and (for
	successful migration) non-present PTEs must use pseudo-physical
	addresses.  This is because on migration MFNs in present PTEs are
	translated to PFNs (canonicalised) so they may be translated back
	to the new MFN in the destination domain (uncanonicalised).

	pte_mknonnuma(), pmd_mknonnuma(), pte_mknuma() and pmd_mknuma()
	set and clear the _PAGE_PRESENT bit using pte_set_flags(),
	pte_clear_flags(), etc.

	In a Xen PV guest, these functions must translate MFNs to PFNs
	when clearing _PAGE_PRESENT and translate PFNs to MFNs when setting
	_PAGE_PRESENT.

His suggested fix converted p[te|md]_[set|clear]_flags to using
paravirt-friendly ops but this is overkill.  He suggested an alternative
of using p[te|md]_modify in the NUMA page table operations but this is
does more work than necessary and would require looking up a VMA for
protections.

This patch modifies the NUMA page table operations to use paravirt
friendly operations to set/clear the flags of interest.  Unfortunately
this will take a performance hit when updating the PTEs on
CONFIG_PARAVIRT but I do not see a way around it that does not break
Xen.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
Tested-by: NDavid Vrabel <david.vrabel@citrix.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Noonan <steven@uplinklabs.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

29c77870

wait: explain the shadowing and type inconsistencies · 8b32201d

由 Peter Zijlstra 提交于 4月 18, 2014

Stick in a comment before someone else tries to fix the sparse warning
this generates.
Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-o2ro6f3vkxklni0bc8f7m68s@git.kernel.orgSigned-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8b32201d

Shiraz has moved · 9cc23682

由 Viresh Kumar 提交于 4月 18, 2014

shiraz.hashim@st.com email-id doesn't exist anymore as he has left the
company.  Replace ST's id with shiraz.linux.kernel@gmail.com.

It also updates .mailmap file to fix address for 'git shortlog'.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Cc: Shiraz Hashim <shiraz.linux.kernel@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9cc23682

net: sctp: cache auth_enable per endpoint · b14878cc

由 Vlad Yasevich 提交于 4月 17, 2014

Currently, it is possible to create an SCTP socket, then switch
auth_enable via sysctl setting to 1 and crash the system on connect:

Oops[#1]:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1
task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000
[...]
Call Trace:
[<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80
[<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4
[<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c
[<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8
[<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214
[<ffffffff8043af68>] sctp_rcv+0x588/0x630
[<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24
[<ffffffff803acb50>] ip6_input+0x2c0/0x440
[<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564
[<ffffffff80310650>] process_backlog+0xb4/0x18c
[<ffffffff80313cbc>] net_rx_action+0x12c/0x210
[<ffffffff80034254>] __do_softirq+0x17c/0x2ac
[<ffffffff800345e0>] irq_exit+0x54/0xb0
[<ffffffff800075a4>] ret_from_irq+0x0/0x4
[<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48
[<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148
[<ffffffff805a88b0>] start_kernel+0x37c/0x398
Code: dd0900b8  000330f8  0126302d <dcc60000> 50c0fff1  0047182a  a48306a0
03e00008  00000000
---[ end trace b530b0551467f2fd ]---
Kernel panic - not syncing: Fatal exception in interrupt

What happens while auth_enable=0 in that case is, that
ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs()
when endpoint is being created.

After that point, if an admin switches over to auth_enable=1,
the machine can crash due to NULL pointer dereference during
reception of an INIT chunk. When we enter sctp_process_init()
via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk,
the INIT verification succeeds and while we walk and process
all INIT params via sctp_process_param() we find that
net->sctp.auth_enable is set, therefore do not fall through,
but invoke sctp_auth_asoc_set_default_hmac() instead, and thus,
dereference what we have set to NULL during endpoint
initialization phase.

The fix is to make auth_enable immutable by caching its value
during endpoint initialization, so that its original value is
being carried along until destruction. The bug seems to originate
from the very first days.

Fix in joint work with Daniel Borkmann.
Reported-by: NJoshua Kinard <kumba@gentoo.org>
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Tested-by: NJoshua Kinard <kumba@gentoo.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b14878cc

libata/ahci: accommodate tag ordered controllers · 8a4aeec8

由 Dan Williams 提交于 4月 17, 2014

The AHCI spec allows implementations to issue commands in tag order
rather than FIFO order:

	5.3.2.12 P:SelectCmd
	HBA sets pSlotLoc = (pSlotLoc + 1) mod (CAP.NCS + 1)
	or HBA selects the command to issue that has had the
	PxCI bit set to '1' longer than any other command
	pending to be issued.

The result is that commands posted sequentially (time-wise) may play out
of sequence when issued by hardware.

This behavior has likely been hidden by drives that arrange for commands
to complete in issue order.  However, it appears recent drives (two from
different vendors that we have found so far) inflict out-of-order
completions as a matter of course.  So, we need to take care to maintain
ordered submission, otherwise we risk triggering a drive to fall out of
sequential-io automation and back to random-io processing, which incurs
large latency and degrades throughput.

This issue was found in simple benchmarks where QD=2 seq-write
performance was 30-50% *greater* than QD=32 seq-write performance.

Tagging for -stable and making the change globally since it has a low
risk-to-reward ratio.  Also, word is that recent versions of an unnamed
OS also does it this way now.  So, drives in the field are already
experienced with this tag ordering scheme.

Cc: <stable@vger.kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ed Ciechanowski <ed.ciechanowski@intel.com>
Reviewed-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

8a4aeec8

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多