提交 · 60e79c922fa8e593d9adc26bd3489008734f0eaa · openeuler / Kernel

27 12月, 2019 40 次提交

KVM: arm64: Reset the PMU in preemptible context · 60e79c92

由 Marc Zyngier 提交于 5月 04, 2019

[ Upstream commit ebff0b0e ]

We've become very cautious to now always reset the vcpu when nothing
is loaded on the physical CPU. To do so, we now disable preemption
and do a kvm_arch_vcpu_put() to make sure we have all the state
in memory (and that it won't be loaded behind out back).

This now causes issues with resetting the PMU, which calls into perf.
Perf itself uses mutexes, which clashes with the lack of preemption.
It is worth realizing that the PMU is fully emulated, and that
no PMU state is ever loaded on the physical CPU. This means we can
perfectly reset the PMU outside of the non-preemptible section.

Fixes: e761a927 ("KVM: arm/arm64: Reset the VCPU without preemption and vcpu state loaded")
Reported-by: NJulien Grall <julien.grall@arm.com>
Tested-by: NJulien Grall <julien.grall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

60e79c92

serial: ar933x_uart: Fix build failure with disabled console · 6ac639f8

由 Petr Štetiar 提交于 5月 04, 2019

[ Upstream commit 72ff51d8 ]

Andrey has reported on OpenWrt's bug tracking system[1], that he
currently can't use ar93xx_uart as pure serial UART without console
(CONFIG_SERIAL_8250_CONSOLE and CONFIG_SERIAL_AR933X_CONSOLE undefined),
because compilation ends with following error:

 ar933x_uart.c: In function 'ar933x_uart_console_write':
 ar933x_uart.c:550:14: error: 'struct uart_port' has no
                               member named 'sysrq'

So this patch moves all the code related to console handling behind
series of CONFIG_SERIAL_AR933X_CONSOLE ifdefs.

1. https://bugs.openwrt.org/index.php?do=details&task_id=2152

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Andrey Batyiev <batyiev@gmail.com>
Reported-by: NAndrey Batyiev <batyiev@gmail.com>
Tested-by: NAndrey Batyiev <batyiev@gmail.com>
Signed-off-by: NPetr Štetiar <ynezz@true.cz>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

6ac639f8

sc16is7xx: missing unregister/delete driver on error in sc16is7xx_init() · 1c30433d

由 Mao Wenan 提交于 5月 04, 2019

[ Upstream commit ac0cdb3d ]

Add the missing uart_unregister_driver() and i2c_del_driver() before return
from sc16is7xx_init() in the error handling case.
Signed-off-by: NMao Wenan <maowenan@huawei.com>
Reviewed-by: NVladimir Zapolskiy <vz@mleia.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

1c30433d

ARM: imx51: fix a leaked reference by adding missing of_node_put · eb3d2620

由 Wen Yang 提交于 5月 04, 2019

[ Upstream commit 0c17e83f ]

The call to of_get_next_child returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.

Detected by coccinelle with the following warnings:
./arch/arm/mach-imx/mach-imx51.c:64:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 57, but without a corresponding object release within this function.
Signed-off-by: NWen Yang <wen.yang99@zte.com.cn>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: NXP Linux Team <linux-imx@nxp.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NShawn Guo <shawnguo@kernel.org>
Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

eb3d2620

s390/qeth: fix race when initializing the IP address table · 35835c39

由 Julian Wiedmann 提交于 5月 04, 2019

[ Upstream commit 7221b727 ]

The ucast IP table is utilized by some of the L3-specific sysfs attributes
that qeth_l3_create_device_attributes() provides. So initialize the table
_before_ registering the attributes.

Fixes: ebccc739 ("s390/qeth: add missing hash table initializations")
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

35835c39

netfilter: fix NETFILTER_XT_TARGET_TEE dependencies · 48d696e7

由 Arnd Bergmann 提交于 5月 04, 2019

[ Upstream commit d1fa3810 ]

With NETFILTER_XT_TARGET_TEE=y and IP6_NF_IPTABLES=m, we get a link
error when referencing the NF_DUP_IPV6 module:

net/netfilter/xt_TEE.o: In function `tee_tg6':
xt_TEE.c:(.text+0x14): undefined reference to `nf_dup_ipv6'

The problem here is the 'select NF_DUP_IPV6 if IP6_NF_IPTABLES'
that forces NF_DUP_IPV6 to be =m as well rather than setting it
to =y as was intended here. Adding a soft dependency on
IP6_NF_IPTABLES avoids that broken configuration.

Fixes: 5d400a49 ("netfilter: Kconfig: Change select IPv6 dependencies")
Cc: Máté Eckl <ecklm94@gmail.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Link: https://patchwork.ozlabs.org/patch/999498/
Link: https://lore.kernel.org/patchwork/patch/960062/Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

48d696e7

staging, mt7621-pci: fix build without pci support · 3da6af68

由 Maxim Zhukov 提交于 5月 04, 2019

[ Upstream commit 90cd9bed ]

Add depends on PCI for PCI_MT7621
Signed-off-by: NMaxim Zhukov <mussitantesmortem@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

3da6af68

staging: axis-fifo: add CONFIG_OF dependency · 1124b803

由 Arnd Bergmann 提交于 5月 04, 2019

[ Upstream commit 1beea620 ]

When building without CONFIG_OF, the compiler loses track of the flow
control in axis_fifo_probe(), and thinks that many variables are used
without an initialization even though we actually leave the function
before the first use:

drivers/staging/axis-fifo/axis-fifo.c: In function 'axis_fifo_probe':
drivers/staging/axis-fifo/axis-fifo.c:900:5: error: 'rxd_tdata_width' may be used uninitialized in this function [-Werror=maybe-uninitialized]
  if (rxd_tdata_width != 32) {
     ^
drivers/staging/axis-fifo/axis-fifo.c:907:5: error: 'txd_tdata_width' may be used uninitialized in this function [-Werror=maybe-uninitialized]
  if (txd_tdata_width != 32) {
     ^
drivers/staging/axis-fifo/axis-fifo.c:914:5: error: 'has_tdest' may be used uninitialized in this function [-Werror=maybe-uninitialized]
  if (has_tdest) {
     ^
drivers/staging/axis-fifo/axis-fifo.c:919:5: error: 'has_tid' may be used uninitialized in this function [-Werror=maybe-uninitialized]

When CONFIG_OF is set, this does not happen, and since the driver cannot
work without it, just add that option as a Kconfig dependency.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

1124b803

xsk: fix umem memory leak on cleanup · 230e6be1

由 Björn Töpel 提交于 5月 04, 2019

[ Upstream commit 044175a0 ]

When the umem is cleaned up, the task that created it might already be
gone. If the task was gone, the xdp_umem_release function did not free
the pages member of struct xdp_umem.

It turned out that the task lookup was not needed at all; The code was
a left-over when we moved from task accounting to user accounting [1].

This patch fixes the memory leak by removing the task lookup logic
completely.

[1] https://lore.kernel.org/netdev/20180131135356.19134-3-bjorn.topel@gmail.com/

Link: https://lore.kernel.org/netdev/c1cb2ca8-6a14-3980-8672-f3de0bb38dfd@suse.cz/
Fixes: c0c77d8f ("xsk: add user memory registration support sockopt")
Reported-by: NJiri Slaby <jslaby@suse.cz>
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

230e6be1

qlcnic: Avoid potential NULL pointer dereference · 471412ef

由 Aditya Pakki 提交于 5月 04, 2019

[ Upstream commit 5bf7295f ]

netdev_alloc_skb can fail and return a NULL pointer which is
dereferenced without a check. The patch avoids such a scenario.
Signed-off-by: NAditya Pakki <pakki001@umn.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

471412ef

net: stmmac: don't set own bit too early for jumbo frames · 94ee47a9

由 Aaro Koskinen 提交于 5月 04, 2019

[ Upstream commit 80acbed9 ]

Commit 0e80bdc9 ("stmmac: first frame prep at the end of xmit
routine") overlooked jumbo frames when re-ordering the code, and as a
result the own bit was not getting set anymore for the first jumbo frame
descriptor. Commit 487e2e22 ("net: stmmac: Set OWN bit for jumbo
frames") tried to fix this, but now the bit is getting set too early and
the DMA may start while we are still setting up the remaining descriptors.
And with the chain mode the own bit remains still unset.

Fix by setting the own bit at the end of xmit also with jumbo frames.

Fixes: 0e80bdc9 ("stmmac: first frame prep at the end of xmit routine")
Fixes: 487e2e22 ("net: stmmac: Set OWN bit for jumbo frames")
Signed-off-by: NAaro Koskinen <aaro.koskinen@nokia.com>
Acked-by: NJose Abreu <joabreu@synopsys.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

94ee47a9

ieee802154: hwsim: propagate genlmsg_reply return code · c0e431ac

由 Li RongQing 提交于 5月 04, 2019

[ Upstream commit 19b39a25 ]

genlmsg_reply can fail, so propagate its return code
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Signed-off-by: NStefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c0e431ac

net: ieee802154: fix a potential NULL pointer dereference · 000a5b1d

由 Kangjie Lu 提交于 5月 04, 2019

[ Upstream commit 2795e8c2 ]

In case alloc_ordered_workqueue fails, the fix releases
sources and returns -ENOMEM to avoid NULL pointer dereference.
Signed-off-by: NKangjie Lu <kjlu@umn.edu>
Acked-by: NMichael Hennerich <michael.hennerich@analog.com>
Signed-off-by: NStefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

000a5b1d

s390: limit brk randomization to 32MB · c468aa03

由 Martin Schwidefsky 提交于 5月 04, 2019

[ Upstream commit cd479ecc ]

For a 64-bit process the randomization of the program break is quite
large with 1GB. That is as big as the randomization of the anonymous
mapping base, for a test case started with '/lib/ld64.so.1 <exec>'
it can happen that the heap is placed after the stack. To avoid
this limit the program break randomization to 32MB for 64-bit and
keep 8MB for 31-bit.
Reported-by: NStefan Liebler <stli@linux.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c468aa03

ARM: dts: bcm283x: Fix hdmi hpd gpio pull · 908f2257

由 Helen Koike 提交于 5月 04, 2019

[ Upstream commit 544e7841 ]

Raspberry pi board model B revison 2 have the hot plug detector gpio
active high (and not low as it was in the dts).
Signed-off-by: NHelen Koike <helen.koike@collabora.com>
Fixes: 49ac67e0 ("ARM: bcm2835: Add VC4 to the device tree.")
Reviewed-by: NEric Anholt <eric@anholt.net>
Signed-off-by: NEric Anholt <eric@anholt.net>
Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

908f2257

Revert "ACPICA: Clear status of GPEs before enabling them" · e60aafee

由 Rafael J. Wysocki 提交于 5月 04, 2019

commit 2c2a2fb1 upstream.

Revert commit c8b1917c ("ACPICA: Clear status of GPEs before
enabling them") that causes problems with Thunderbolt controllers
to occur if a dock device is connected at init time (the xhci_hcd
and thunderbolt modules crash which prevents peripherals connected
through them from working).

Commit c8b1917c effectively causes commit ecc1165b ("ACPICA:
Dispatch active GPEs at init time") to get undone, so the problem
addressed by commit ecc1165b appears again as a result of it.

Fixes: c8b1917c ("ACPICA: Clear status of GPEs before enabling them")
Link: https://lore.kernel.org/lkml/s5hy33siofw.wl-tiwai@suse.de/T/#u
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1132943Reported-by: NMichael Hirmke <opensuse@mike.franken.de>
Reported-by: NTakashi Iwai <tiwai@suse.de>
Cc: 4.17+ <stable@vger.kernel.org> # 4.17+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

e60aafee

selinux: use kernel linux/socket.h for genheaders and mdp · 71e217e8

由 Paulo Alcantara 提交于 5月 04, 2019

commit dfbd199a upstream.

When compiling genheaders and mdp from a newer host kernel, the
following error happens:

    In file included from scripts/selinux/genheaders/genheaders.c:18:
    ./security/selinux/include/classmap.h:238:2: error: #error New
    address family defined, please update secclass_map.  #error New
    address family defined, please update secclass_map.  ^~~~~
    make[3]: *** [scripts/Makefile.host:107:
    scripts/selinux/genheaders/genheaders] Error 1 make[2]: ***
    [scripts/Makefile.build:599: scripts/selinux/genheaders] Error 2
    make[1]: *** [scripts/Makefile.build:599: scripts/selinux] Error 2
    make[1]: *** Waiting for unfinished jobs....

Instead of relying on the host definition, include linux/socket.h in
classmap.h to have PF_MAX.

Cc: stable@vger.kernel.org
Signed-off-by: NPaulo Alcantara <paulo@paulo.ac>
Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
[PM: manually merge in mdp.c, subject line tweaks]
Signed-off-by: NPaul Moore <paul@paul-moore.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

71e217e8

Linux 4.19.38 · 26423f9b

由 Greg Kroah-Hartman 提交于 5月 03, 2019

Merge 74 patches from 4.19.38 stable
branch (102 total) beside 28 already merged patches:
db99f12 netfilter: nft_compat: use refcnt_t type for nft_xt reference count
7693bae netfilter: nft_compat: make lists per netns
cb2e343d netfilter: nft_compat: destroy function must not have side effects
8766cc7 ext4: fix some error pointer dereferences
070e34b tipc: handle the err returned from cmd header function
1832b15 loop: do not print warn message if partition scan is successful
ae5e0c7 ipvs: fix warning on unused variable
d5bf783 cifs: fix memory leak in SMB2_read
f7b467a vfio/type1: Limit DMA mappings per container
50cda88 USB: Add new USB LPM helpers
f41d2de USB: Consolidate LPM checks to avoid enabling LPM twice
4d476a0 fs/proc/proc_sysctl.c: Fix a NULL pointer dereference
245a94a sched/deadline: Correctly handle active 0-lag timers
94ad68a NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
f21fae8 tipc: check bearer name with right length in tipc_nl_compat_bearer_enable
a0cb0fa tipc: check link name with right length in tipc_nl_compat_link_set
9101cbe aio: clear IOCB_HIPRI
730198c aio: separate out ring reservation from req allocation
4d67768 aio: use iocb_put() instead of open coding it
d384f8b aio: split out iocb copy from io_submit_one()
a812f7b aio: abstract out io_event filler helper
d6b2615 aio: simplify - and fix - fget/fput for io_submit()
c7f2525 pin iocb through aio.
592ea63 aio: fold lookup_kiocb() into its sole caller
c20202c51 aio: keep io_event in aio_kiocb
aab66df aio: store event at final iocb_put()
e9e4777 Fix aio_poll() races
e875a40 net: hns: Fix WARNING when hns modules installed
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

26423f9b

powerpc/fsl: Add FSL_PPC_BOOK3E as supported arch for nospectre_v2 boot arg · 36e87fb0

由 Diana Craciun 提交于 5月 03, 2019

commit e59f5bd7 upstream.
Signed-off-by: NDiana Craciun <diana.craciun@nxp.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

36e87fb0

net/tls: don't leak IV and record seq when offload fails · f6ce1f08

由 Jakub Kicinski 提交于 5月 03, 2019

[ Upstream commit 12c76861 ]

When device refuses the offload in tls_set_device_offload_rx()
it calls tls_sw_free_resources_rx() to clean up software context
state.

Unfortunately, tls_sw_free_resources_rx() does not free all
the state tls_set_sw_offload() allocated - it leaks IV and
sequence number buffers.  All other code paths which lead to
tls_sw_release_resources_rx() (which tls_sw_free_resources_rx()
calls) free those right before the call.

Avoid the leak by moving freeing of iv and rec_seq into
tls_sw_release_resources_rx().

Fixes: 4799ac81 ("tls: Add rx inline crypto offload")
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

f6ce1f08

net/tls: avoid potential deadlock in tls_set_device_offload_rx() · f38542cf

由 Jakub Kicinski 提交于 5月 03, 2019

[ Upstream commit 62ef81d5 ]

If device supports offload, but offload fails tls_set_device_offload_rx()
will call tls_sw_free_resources_rx() which (unhelpfully) releases
and reacquires the socket lock.

For a small fix release and reacquire the device_offload_lock.

Fixes: 4799ac81 ("tls: Add rx inline crypto offload")
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

f38542cf

net/mlx5e: Fix use-after-free after xdp_return_frame · 0c6c88f7

由 Maxim Mikityanskiy 提交于 5月 03, 2019

[ Upstream commit 12fc512f ]

xdp_return_frame releases the frame. It leads to releasing the page, so
it's not allowed to access xdpi.xdpf->len after that, because xdpi.xdpf
is at xdp->data_hard_start after convert_to_xdp_frame. This patch moves
the memory access to precede the return of the frame.

Fixes: 58b99ee3 ("net/mlx5e: Add support for XDP_REDIRECT in device-out side")
Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

0c6c88f7

net/mlx5e: Fix the max MTU check in case of XDP · 76d1e775

由 Maxim Mikityanskiy 提交于 5月 03, 2019

[ Upstream commit d460c271 ]

MLX5E_XDP_MAX_MTU was calculated incorrectly. It didn't account for
NET_IP_ALIGN and MLX5E_HW2SW_MTU, and it also misused MLX5_SKB_FRAG_SZ.
This commit fixes the calculations and adds a brief explanation for the
formula used.

Fixes: a26a5bdf ("net/mlx5e: Restrict the combination of large MTU and XDP")
Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

76d1e775

mlxsw: spectrum: Put MC TCs into DWRR mode · 5032fdbd

由 Petr Machata 提交于 5月 03, 2019

[ Upstream commit f476b3f8 ]

Both Spectrum-1 and Spectrum-2 chips are currently configured such that
pairs of TC n (which is used for UC traffic) and TC n+8 (which is used
for MC traffic) are feeding into the same subgroup. Strict
prioritization is configured between the two TCs, and by enabling
MC-aware mode on the switch, the lower-numbered (UC) TCs are favored
over the higher-numbered (MC) TCs.

On Spectrum-2 however, there is an issue in configuration of the
MC-aware mode. As a result, MC traffic is prioritized over UC traffic.
To work around the issue, configure the MC TCs with DWRR mode (while
keeping the UC TCs in strict mode).

With this patch, the multicast-unicast arbitration results in the same
behavior on both Spectrum-1 and Spectrum-2 chips.

Fixes: 7b819530 ("mlxsw: spectrum: Configure MC-aware mode on mlxsw ports")
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

5032fdbd

mlxsw: pci: Reincrease PCI reset timeout · 8cf91392

由 Ido Schimmel 提交于 5月 03, 2019

[ Upstream commit 1ab30301 ]

During driver initialization the driver sends a reset to the device and
waits for the firmware to signal that it is ready to continue.

Commit d2f372ba ("mlxsw: pci: Increase PCI SW reset timeout")
increased the timeout to 13 seconds due to longer PHY calibration in
Spectrum-2 compared to Spectrum-1.

Recently it became apparent that this timeout is too short and therefore
this patch increases it again to a safer limit that will be reduced in
the future.

Fixes: c3ab4354 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Fixes: d2f372ba ("mlxsw: pci: Increase PCI SW reset timeout")
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

8cf91392

team: fix possible recursive locking when add slaves · 5bfd1744

由 Hangbin Liu 提交于 5月 03, 2019

[ Upstream commit 925b0c84 ]

If we add a bond device which is already the master of the team interface,
we will hold the team->lock in team_add_slave() first and then request the
lock in team_set_mac_address() again. The functions are called like:

- team_add_slave()
 - team_port_add()
   - team_port_enter()
     - team_modeop_port_enter()
       - __set_port_dev_addr()
         - dev_set_mac_address()
           - bond_set_mac_address()
             - dev_set_mac_address()
  	       - team_set_mac_address

Although team_upper_dev_link() would check the upper devices but it is
called too late. Fix it by adding a checking before processing the slave.

v2: Do not split the string in netdev_err()

Fixes: 3d249d4c ("net: introduce ethernet teaming device")
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

5bfd1744

stmmac: pci: Adjust IOT2000 matching · b3f4b4eb

由 Su Bao Cheng 提交于 5月 03, 2019

[ Upstream commit e0c1d14a ]

Since there are more IOT2040 variants with identical hardware but
different asset tags, the asset tag matching should be adjusted to
support them.

For the board name "SIMATIC IOT2000", currently there are 2 types of
hardware, IOT2020 and IOT2040. The IOT2020 is identified by its unique
asset tag. Match on it first. If we then match on the board name only,
we will catch all IOT2040 variants. In the future there will be no other
devices with the "SIMATIC IOT2000" DMI board name but different
hardware.
Signed-off-by: NSu Bao Cheng <baocheng.su@siemens.com>
Reviewed-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b3f4b4eb

net/tls: fix refcount adjustment in fallback · 2e0c4585

由 Jakub Kicinski 提交于 5月 03, 2019

[ Upstream commit 9188d5ca ]

Unlike atomic_add(), refcount_add() does not deal well
with a negative argument.  TLS fallback code reallocates
the skb and is very likely to shrink the truesize, leading to:

[  189.513254] WARNING: CPU: 5 PID: 0 at lib/refcount.c:81 refcount_add_not_zero_checked+0x15c/0x180
 Call Trace:
  refcount_add_checked+0x6/0x40
  tls_enc_skb+0xb93/0x13e0 [tls]

Once wmem_allocated count saturates the application can no longer
send data on the socket.  This is similar to Eric's fixes for GSO,
TCP:
commit 7ec318fe ("tcp: gso: avoid refcount_t warning from tcp_gso_segment()")
and UDP:
commit 575b65bc ("udp: avoid refcount_t saturation in __udp_gso_segment()").

Unlike the GSO case, for TLS fallback it's likely that the skb has
shrunk, so the "likely" annotation is the other way around (likely
branch being "sub").

Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NJohn Hurley <john.hurley@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

2e0c4585

net: stmmac: move stmmac_check_ether_addr() to driver probe · faca24a8

由 Vinod Koul 提交于 5月 03, 2019

[ Upstream commit b561af36 ]

stmmac_check_ether_addr() checks the MAC address and assigns one in
driver open(). In many cases when we create slave netdevice, the dev
addr is inherited from master but the master dev addr maybe NULL at
that time, so move this call to driver probe so that address is
always valid.
Signed-off-by: NXiaofei Shen <xiaofeis@codeaurora.org>
Tested-by: NXiaofei Shen <xiaofeis@codeaurora.org>
Signed-off-by: NSneh Shah <snehshah@codeaurora.org>
Signed-off-by: NVinod Koul <vkoul@kernel.org>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

faca24a8

net/rose: fix unbound loop in rose_loopback_timer() · 53fef40f

由 Eric Dumazet 提交于 5月 03, 2019

[ Upstream commit 0453c682 ]

This patch adds a limit on the number of skbs that fuzzers can queue
into loopback_queue. 1000 packets for rose loopback seems more than enough.

Then, since we now have multiple cpus in most linux hosts,
we also need to limit the number of skbs rose_loopback_timer()
can dequeue at each round.

rose_loopback_queue() can be drop-monitor friendly, calling
consume_skb() or kfree_skb() appropriately.

Finally, use mod_timer() instead of del_timer() + add_timer()

syzbot report was :

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu:    0-...!: (10499 ticks this GP) idle=536/1/0x4000000000000002 softirq=103291/103291 fqs=34
rcu:     (t=10500 jiffies g=140321 q=323)
rcu: rcu_preempt kthread starved for 10426 jiffies! g140321 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
rcu: RCU grace-period kthread stack dump:
rcu_preempt     I29168    10      2 0x80000000
Call Trace:
 context_switch kernel/sched/core.c:2877 [inline]
 __schedule+0x813/0x1cc0 kernel/sched/core.c:3518
 schedule+0x92/0x180 kernel/sched/core.c:3562
 schedule_timeout+0x4db/0xfd0 kernel/time/timer.c:1803
 rcu_gp_fqs_loop kernel/rcu/tree.c:1971 [inline]
 rcu_gp_kthread+0x962/0x17b0 kernel/rcu/tree.c:2128
 kthread+0x357/0x430 kernel/kthread.c:253
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
NMI backtrace for cpu 0
CPU: 0 PID: 7632 Comm: kworker/0:4 Not tainted 5.1.0-rc5+ #172
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events iterate_cleanup_work
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 nmi_cpu_backtrace.cold+0x63/0xa4 lib/nmi_backtrace.c:101
 nmi_trigger_cpumask_backtrace+0x1be/0x236 lib/nmi_backtrace.c:62
 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
 rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree.c:1223
 print_cpu_stall kernel/rcu/tree.c:1360 [inline]
 check_cpu_stall kernel/rcu/tree.c:1434 [inline]
 rcu_pending kernel/rcu/tree.c:3103 [inline]
 rcu_sched_clock_irq.cold+0x500/0xa4a kernel/rcu/tree.c:2544
 update_process_times+0x32/0x80 kernel/time/timer.c:1635
 tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:161
 tick_sched_timer+0x47/0x130 kernel/time/tick-sched.c:1271
 __run_hrtimer kernel/time/hrtimer.c:1389 [inline]
 __hrtimer_run_queues+0x33e/0xde0 kernel/time/hrtimer.c:1451
 hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1509
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1035 [inline]
 smp_apic_timer_interrupt+0x120/0x570 arch/x86/kernel/apic/apic.c:1060
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807
RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x50 kernel/kcov.c:95
Code: 89 25 b4 6e ec 08 41 bc f4 ff ff ff e8 cd 5d ea ff 48 c7 05 9e 6e ec 08 00 00 00 00 e9 a4 e9 ff ff 90 90 90 90 90 90 90 90 90 <55> 48 89 e5 48 8b 75 08 65 48 8b 04 25 00 ee 01 00 65 8b 15 c8 60
RSP: 0018:ffff8880ae807ce0 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
RAX: ffff88806fd40640 RBX: dffffc0000000000 RCX: ffffffff863fbc56
RDX: 0000000000000100 RSI: ffffffff863fbc1d RDI: ffff88808cf94228
RBP: ffff8880ae807d10 R08: ffff88806fd40640 R09: ffffed1015d00f8b
R10: ffffed1015d00f8a R11: 0000000000000003 R12: ffff88808cf941c0
R13: 00000000fffff034 R14: ffff8882166cd840 R15: 0000000000000000
 rose_loopback_timer+0x30d/0x3f0 net/rose/rose_loopback.c:91
 call_timer_fn+0x190/0x720 kernel/time/timer.c:1325
 expire_timers kernel/time/timer.c:1362 [inline]
 __run_timers kernel/time/timer.c:1681 [inline]
 __run_timers kernel/time/timer.c:1649 [inline]
 run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694
 __do_softirq+0x266/0x95a kernel/softirq.c:293
 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1027

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

53fef40f

net: rds: exchange of 8K and 1M pool · edb7ec7c

由 Zhu Yanjun 提交于 5月 03, 2019

[ Upstream commit 4b9fc714 ]

Before the commit 490ea596 ("RDS: IB: move FMR code to its own file"),
when the dirty_count is greater than 9/10 of max_items of 8K pool,
1M pool is used, Vice versa. After the commit 490ea596 ("RDS: IB: move
FMR code to its own file"), the above is removed. When we make the
following tests.

Server:
  rds-stress -r 1.1.1.16 -D 1M

Client:
  rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M

The following will appear.
"
connecting to 1.1.1.16:4000
negotiated options, tasks will start in 2 seconds
Starting up..header from 1.1.1.166:4001 to id 4001 bogus
..
tsks  tx/s  rx/s tx+rx K/s  mbi K/s  mbo K/s tx us/c  rtt us
cpu %
   1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
   1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
   1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
   1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
   1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
...
"
So this exchange between 8K and 1M pool is added back.

Fixes: commit 490ea596 ("RDS: IB: move FMR code to its own file")
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

edb7ec7c

net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query · ff68cc2e

由 Erez Alfasi 提交于 5月 03, 2019

[ Upstream commit ace329f4 ]

Querying EEPROM high pages data for SFP module is currently
not supported by our driver and yet queried, resulting in
invalid FW queries.

Set the EEPROM ethtool data length to 256 for SFP module will
limit the reading for page 0 only and prevent invalid FW queries.

Fixes: bb64143e ("net/mlx5e: Add ethtool support for dump module EEPROM")
Signed-off-by: NErez Alfasi <ereza@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

ff68cc2e

mlxsw: spectrum: Fix autoneg status in ethtool · 86cdfa6a

由 Amit Cohen 提交于 5月 03, 2019

[ Upstream commit 151f0ddd ]

If link is down and autoneg is set to on/off, the status in ethtool does
not change.

The reason is when the link is down the function returns with zero
before changing autoneg value.

Move the checking of link state (up/down) to be performed after setting
autoneg value, in order to be sure that autoneg will change in any case.

Fixes: 56ade8fe ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: NAmit Cohen <amitc@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

86cdfa6a

ipv4: set the tcp_min_rtt_wlen range from 0 to one day · 337b48f6

由 ZhangXiaoxu 提交于 5月 03, 2019

[ Upstream commit 19fad20d ]

There is a UBSAN report as below:
UBSAN: Undefined behaviour in net/ipv4/tcp_input.c:2877:56
signed integer overflow:
2147483647 * 1000 cannot be represented in type 'int'
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.1.0-rc4-00058-g582549e3 #1
Call Trace:
 <IRQ>
 dump_stack+0x8c/0xba
 ubsan_epilogue+0x11/0x60
 handle_overflow+0x12d/0x170
 ? ttwu_do_wakeup+0x21/0x320
 __ubsan_handle_mul_overflow+0x12/0x20
 tcp_ack_update_rtt+0x76c/0x780
 tcp_clean_rtx_queue+0x499/0x14d0
 tcp_ack+0x69e/0x1240
 ? __wake_up_sync_key+0x2c/0x50
 ? update_group_capacity+0x50/0x680
 tcp_rcv_established+0x4e2/0xe10
 tcp_v4_do_rcv+0x22b/0x420
 tcp_v4_rcv+0xfe8/0x1190
 ip_protocol_deliver_rcu+0x36/0x180
 ip_local_deliver+0x15b/0x1a0
 ip_rcv+0xac/0xd0
 __netif_receive_skb_one_core+0x7f/0xb0
 __netif_receive_skb+0x33/0xc0
 netif_receive_skb_internal+0x84/0x1c0
 napi_gro_receive+0x2a0/0x300
 receive_buf+0x3d4/0x2350
 ? detach_buf_split+0x159/0x390
 virtnet_poll+0x198/0x840
 ? reweight_entity+0x243/0x4b0
 net_rx_action+0x25c/0x770
 __do_softirq+0x19b/0x66d
 irq_exit+0x1eb/0x230
 do_IRQ+0x7a/0x150
 common_interrupt+0xf/0xf
 </IRQ>

It can be reproduced by:
  echo 2147483647 > /proc/sys/net/ipv4/tcp_min_rtt_wlen

Fixes: f6722583 ("tcp: track min RTT using windowed min-filter")
Signed-off-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

337b48f6

ipv4: add sanity checks in ipv4_link_failure() · 2fe8d032

由 Eric Dumazet 提交于 5月 03, 2019

[ Upstream commit 20ff83f1 ]

Before calling __ip_options_compile(), we need to ensure the network
header is a an IPv4 one, and that it is already pulled in skb->head.

RAW sockets going through a tunnel can end up calling ipv4_link_failure()
with total garbage in the skb, or arbitrary lengthes.

syzbot report :

BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:355 [inline]
BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0x294/0x1120 net/ipv4/ip_options.c:123
Write of size 69 at addr ffff888096abf068 by task syz-executor.4/9204

CPU: 0 PID: 9204 Comm: syz-executor.4 Not tainted 5.1.0-rc5+ #77
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
 check_memory_region_inline mm/kasan/generic.c:185 [inline]
 check_memory_region+0x123/0x190 mm/kasan/generic.c:191
 memcpy+0x38/0x50 mm/kasan/common.c:133
 memcpy include/linux/string.h:355 [inline]
 __ip_options_echo+0x294/0x1120 net/ipv4/ip_options.c:123
 __icmp_send+0x725/0x1400 net/ipv4/icmp.c:695
 ipv4_link_failure+0x29f/0x550 net/ipv4/route.c:1204
 dst_link_failure include/net/dst.h:427 [inline]
 vti6_xmit net/ipv6/ip6_vti.c:514 [inline]
 vti6_tnl_xmit+0x10d4/0x1c0c net/ipv6/ip6_vti.c:553
 __netdev_start_xmit include/linux/netdevice.h:4414 [inline]
 netdev_start_xmit include/linux/netdevice.h:4423 [inline]
 xmit_one net/core/dev.c:3292 [inline]
 dev_hard_start_xmit+0x1b2/0x980 net/core/dev.c:3308
 __dev_queue_xmit+0x271d/0x3060 net/core/dev.c:3878
 dev_queue_xmit+0x18/0x20 net/core/dev.c:3911
 neigh_direct_output+0x16/0x20 net/core/neighbour.c:1527
 neigh_output include/net/neighbour.h:508 [inline]
 ip_finish_output2+0x949/0x1740 net/ipv4/ip_output.c:229
 ip_finish_output+0x73c/0xd50 net/ipv4/ip_output.c:317
 NF_HOOK_COND include/linux/netfilter.h:278 [inline]
 ip_output+0x21f/0x670 net/ipv4/ip_output.c:405
 dst_output include/net/dst.h:444 [inline]
 NF_HOOK include/linux/netfilter.h:289 [inline]
 raw_send_hdrinc net/ipv4/raw.c:432 [inline]
 raw_sendmsg+0x1d2b/0x2f20 net/ipv4/raw.c:663
 inet_sendmsg+0x147/0x5d0 net/ipv4/af_inet.c:798
 sock_sendmsg_nosec net/socket.c:651 [inline]
 sock_sendmsg+0xdd/0x130 net/socket.c:661
 sock_write_iter+0x27c/0x3e0 net/socket.c:988
 call_write_iter include/linux/fs.h:1866 [inline]
 new_sync_write+0x4c7/0x760 fs/read_write.c:474
 __vfs_write+0xe4/0x110 fs/read_write.c:487
 vfs_write+0x20c/0x580 fs/read_write.c:549
 ksys_write+0x14f/0x2d0 fs/read_write.c:599
 __do_sys_write fs/read_write.c:611 [inline]
 __se_sys_write fs/read_write.c:608 [inline]
 __x64_sys_write+0x73/0xb0 fs/read_write.c:608
 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458c29
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f293b44bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000458c29
RDX: 0000000000000014 RSI: 00000000200002c0 RDI: 0000000000000003
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f293b44c6d4
R13: 00000000004c8623 R14: 00000000004ded68 R15: 00000000ffffffff

The buggy address belongs to the page:
page:ffffea00025aafc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x1fffc0000000000()
raw: 01fffc0000000000 0000000000000000 ffffffff025a0101 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888096abef80: 00 00 00 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 f2
 ffff888096abf000: f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00
>ffff888096abf080: 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
                         ^
 ffff888096abf100: 00 00 00 00 f1 f1 f1 f1 00 00 f3 f3 00 00 00 00
 ffff888096abf180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Fixes: ed0de45a ("ipv4: recompile ip options in ipv4_link_failure")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Stephen Suryaputra <ssuryaextr@gmail.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

2fe8d032

x86/fpu: Don't export __kernel_fpu_{begin, end}() · f25ea67d

由 Sebastian Andrzej Siewior 提交于 5月 03, 2019

commit 12209993 upstream.

There is one user of __kernel_fpu_begin() and before invoking it,
it invokes preempt_disable(). So it could invoke kernel_fpu_begin()
right away. The 32bit version of arch_efi_call_virt_setup() and
arch_efi_call_virt_teardown() does this already.

The comment above *kernel_fpu*() claims that before invoking
__kernel_fpu_begin() preemption should be disabled and that KVM is a
good example of doing it. Well, KVM doesn't do that since commit

  f775b13e ("x86,kvm: move qemu/guest FPU switching out to vcpu_run")

so it is not an example anymore.

With EFI gone as the last user of __kernel_fpu_{begin|end}(), both can
be made static and not exported anymore.
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NRik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181129150210.2k4mawt37ow6c2vq@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

f25ea67d

mm: Fix warning in insert_pfn() · 80ded77d

由 Jan Kara 提交于 5月 03, 2019

commit f2c57d91 upstream.

In DAX mode a write pagefault can race with write(2) in the following
way:

CPU0                            CPU1
                                write fault for mapped zero page (hole)
dax_iomap_rw()
  iomap_apply()
    xfs_file_iomap_begin()
      - allocates blocks
    dax_iomap_actor()
      invalidate_inode_pages2_range()
        - invalidates radix tree entries in given range
                                dax_iomap_pte_fault()
                                  grab_mapping_entry()
                                    - no entry found, creates empty
                                  ...
                                  xfs_file_iomap_begin()
                                    - finds already allocated block
                                  ...
                                  vmf_insert_mixed_mkwrite()
                                    - WARNs and does nothing because there
                                      is still zero page mapped in PTE
        unmap_mapping_pages()

This race results in WARN_ON from insert_pfn() and is occasionally
triggered by fstest generic/344. Note that the race is otherwise
harmless as before write(2) on CPU0 is finished, we will invalidate page
tables properly and thus user of mmap will see modified data from
write(2) from that point on. So just restrict the warning only to the
case when the PFN in PTE is not zero page.

Link: http://lkml.kernel.org/r/20180824154542.26872-1-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

80ded77d

x86/retpolines: Disable switch jump tables when retpolines are enabled · a3feaabc

由 Daniel Borkmann 提交于 5月 03, 2019

commit a9d57ef1 upstream.

Commit ce02ef06 ("x86, retpolines: Raise limit for generating indirect
calls from switch-case") raised the limit under retpolines to 20 switch
cases where gcc would only then start to emit jump tables, and therefore
effectively disabling the emission of slow indirect calls in this area.

After this has been brought to attention to gcc folks [0], Martin Liska
has then fixed gcc to align with clang by avoiding to generate switch jump
tables entirely under retpolines. This is taking effect in gcc starting
from stable version 8.4.0. Given kernel supports compilation with older
versions of gcc where the fix is not being available or backported anymore,
we need to keep the extra KBUILD_CFLAGS around for some time and generally
set the -fno-jump-tables to align with what more recent gcc is doing
automatically today.

More than 20 switch cases are not expected to be fast-path critical, but
it would still be good to align with gcc behavior for versions < 8.4.0 in
order to have consistency across supported gcc versions. vmlinux size is
slightly growing by 0.27% for older gcc. This flag is only set to work
around affected gcc, no change for clang.

  [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952Suggested-by: NMartin Liska <mliska@suse.cz>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Björn Töpel<bjorn.topel@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Link: https://lkml.kernel.org/r/20190325135620.14882-1-daniel@iogearbox.netSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

a3feaabc

x86, retpolines: Raise limit for generating indirect calls from switch-case · ce8f0945

由 Daniel Borkmann 提交于 5月 03, 2019

commit ce02ef06 upstream.

From networking side, there are numerous attempts to get rid of indirect
calls in fast-path wherever feasible in order to avoid the cost of
retpolines, for example, just to name a few:

  * 283c16a2 ("indirect call wrappers: helpers to speed-up indirect calls of builtin")
  * aaa5d90b ("net: use indirect call wrappers at GRO network layer")
  * 028e0a47 ("net: use indirect call wrappers at GRO transport layer")
  * 356da6d0 ("dma-mapping: bypass indirect calls for dma-direct")
  * 09772d92 ("bpf: avoid retpoline for lookup/update/delete calls on maps")
  * 10870dd8 ("netfilter: nf_tables: add direct calls for all builtin expressions")
  [...]

Recent work on XDP from Björn and Magnus additionally found that manually
transforming the XDP return code switch statement with more than 5 cases
into if-else combination would result in a considerable speedup in XDP
layer due to avoidance of indirect calls in CONFIG_RETPOLINE enabled
builds. On i40e driver with XDP prog attached, a 20-26% speedup has been
observed [0]. Aside from XDP, there are many other places later in the
networking stack's critical path with similar switch-case
processing. Rather than fixing every XDP-enabled driver and locations in
stack by hand, it would be good to instead raise the limit where gcc would
emit expensive indirect calls from the switch under retpolines and stick
with the default as-is in case of !retpoline configured kernels. This would
also have the advantage that for archs where this is not necessary, we let
compiler select the underlying target optimization for these constructs and
avoid potential slow-downs by if-else hand-rewrite.

In case of gcc, this setting is controlled by case-values-threshold which
has an architecture global default that selects 4 or 5 (latter if target
does not have a case insn that compares the bounds) where some arch back
ends like arm64 or s390 override it with their own target hooks, for
example, in gcc commit db7a90aa0de5 ("S/390: Disable prediction of indirect
branches") the threshold pretty much disables jump tables by limit of 20
under retpoline builds.  Comparing gcc's and clang's default code
generation on x86-64 under O2 level with retpoline build results in the
following outcome for 5 switch cases:

* gcc with -mindirect-branch=thunk-inline -mindirect-branch-register:

  # gdb -batch -ex 'disassemble dispatch' ./c-switch
  Dump of assembler code for function dispatch:
   0x0000000000400be0 <+0>:     cmp    $0x4,%edi
   0x0000000000400be3 <+3>:     ja     0x400c35 <dispatch+85>
   0x0000000000400be5 <+5>:     lea    0x915f8(%rip),%rdx        # 0x4921e4
   0x0000000000400bec <+12>:    mov    %edi,%edi
   0x0000000000400bee <+14>:    movslq (%rdx,%rdi,4),%rax
   0x0000000000400bf2 <+18>:    add    %rdx,%rax
   0x0000000000400bf5 <+21>:    callq  0x400c01 <dispatch+33>
   0x0000000000400bfa <+26>:    pause
   0x0000000000400bfc <+28>:    lfence
   0x0000000000400bff <+31>:    jmp    0x400bfa <dispatch+26>
   0x0000000000400c01 <+33>:    mov    %rax,(%rsp)
   0x0000000000400c05 <+37>:    retq
   0x0000000000400c06 <+38>:    nopw   %cs:0x0(%rax,%rax,1)
   0x0000000000400c10 <+48>:    jmpq   0x400c90 <fn_3>
   0x0000000000400c15 <+53>:    nopl   (%rax)
   0x0000000000400c18 <+56>:    jmpq   0x400c70 <fn_2>
   0x0000000000400c1d <+61>:    nopl   (%rax)
   0x0000000000400c20 <+64>:    jmpq   0x400c50 <fn_1>
   0x0000000000400c25 <+69>:    nopl   (%rax)
   0x0000000000400c28 <+72>:    jmpq   0x400c40 <fn_0>
   0x0000000000400c2d <+77>:    nopl   (%rax)
   0x0000000000400c30 <+80>:    jmpq   0x400cb0 <fn_4>
   0x0000000000400c35 <+85>:    push   %rax
   0x0000000000400c36 <+86>:    callq  0x40dd80 <abort>
  End of assembler dump.

* clang with -mretpoline emitting search tree:

  # gdb -batch -ex 'disassemble dispatch' ./c-switch
  Dump of assembler code for function dispatch:
   0x0000000000400b30 <+0>:     cmp    $0x1,%edi
   0x0000000000400b33 <+3>:     jle    0x400b44 <dispatch+20>
   0x0000000000400b35 <+5>:     cmp    $0x2,%edi
   0x0000000000400b38 <+8>:     je     0x400b4d <dispatch+29>
   0x0000000000400b3a <+10>:    cmp    $0x3,%edi
   0x0000000000400b3d <+13>:    jne    0x400b52 <dispatch+34>
   0x0000000000400b3f <+15>:    jmpq   0x400c50 <fn_3>
   0x0000000000400b44 <+20>:    test   %edi,%edi
   0x0000000000400b46 <+22>:    jne    0x400b5c <dispatch+44>
   0x0000000000400b48 <+24>:    jmpq   0x400c20 <fn_0>
   0x0000000000400b4d <+29>:    jmpq   0x400c40 <fn_2>
   0x0000000000400b52 <+34>:    cmp    $0x4,%edi
   0x0000000000400b55 <+37>:    jne    0x400b66 <dispatch+54>
   0x0000000000400b57 <+39>:    jmpq   0x400c60 <fn_4>
   0x0000000000400b5c <+44>:    cmp    $0x1,%edi
   0x0000000000400b5f <+47>:    jne    0x400b66 <dispatch+54>
   0x0000000000400b61 <+49>:    jmpq   0x400c30 <fn_1>
   0x0000000000400b66 <+54>:    push   %rax
   0x0000000000400b67 <+55>:    callq  0x40dd20 <abort>
  End of assembler dump.

  For sake of comparison, clang without -mretpoline:

  # gdb -batch -ex 'disassemble dispatch' ./c-switch
  Dump of assembler code for function dispatch:
   0x0000000000400b30 <+0>:	cmp    $0x4,%edi
   0x0000000000400b33 <+3>:	ja     0x400b57 <dispatch+39>
   0x0000000000400b35 <+5>:	mov    %edi,%eax
   0x0000000000400b37 <+7>:	jmpq   *0x492148(,%rax,8)
   0x0000000000400b3e <+14>:	jmpq   0x400bf0 <fn_0>
   0x0000000000400b43 <+19>:	jmpq   0x400c30 <fn_4>
   0x0000000000400b48 <+24>:	jmpq   0x400c10 <fn_2>
   0x0000000000400b4d <+29>:	jmpq   0x400c20 <fn_3>
   0x0000000000400b52 <+34>:	jmpq   0x400c00 <fn_1>
   0x0000000000400b57 <+39>:	push   %rax
   0x0000000000400b58 <+40>:	callq  0x40dcf0 <abort>
  End of assembler dump.

Raising the cases to a high number (e.g. 100) will still result in similar
code generation pattern with clang and gcc as above, in other words clang
generally turns off jump table emission by having an extra expansion pass
under retpoline build to turn indirectbr instructions from their IR into
switch instructions as a built-in -mno-jump-table lowering of a switch (in
this case, even if IR input already contained an indirect branch).

For gcc, adding --param=case-values-threshold=20 as in similar fashion as
s390 in order to raise the limit for x86 retpoline enabled builds results
in a small vmlinux size increase of only 0.13% (before=18,027,528
after=18,051,192). For clang this option is ignored due to i) not being
needed as mentioned and ii) not having above cmdline
parameter. Non-retpoline-enabled builds with gcc continue to use the
default case-values-threshold setting, so nothing changes here.

[0] https://lore.kernel.org/netdev/20190129095754.9390-1-bjorn.topel@gmail.com/
    and "The Path to DPDK Speeds for AF_XDP", LPC 2018, networking track:
  - http://vger.kernel.org/lpc_net2018_talks/lpc18_pres_af_xdp_perf-v3.pdf
  - http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdfSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: netdev@vger.kernel.org
Cc: David S. Miller <davem@davemloft.net>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20190221221941.29358-1-daniel@iogearbox.netSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

ce8f0945

aio: initialize kiocb private in case any filesystems expect it. · a3669a09

由 Mike Marshall 提交于 5月 03, 2019

commit ec51f8ee upstream.

A recent optimization had left private uninitialized.

Fixes: 2bc4ca9b ("aio: don't zero entire aio_kiocb aio_get_req()")
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Marshall <hubcap@omnibond.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

a3669a09

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功