提交 · 5ed1b15fed2dd29cd448393d30d617923f70cd91 · openanolis / cloud-kernel

12 10月, 2019 34 次提交

efi: Permit multiple entries in persistent memreserve data structure · 5ed1b15f

由 Ard Biesheuvel 提交于 11月 29, 2018

commit 5f0b0ecf043a5319e729c11a53bc8294df12dab3 upstream

In preparation of updating efi_mem_reserve_persistent() to cause less
fragmentation when dealing with many persistent reservations, update
the struct definition and the code that handles it currently so it
can describe an arbitrary number of reservations using a single linked
list entry. The actual optimization will be implemented in a subsequent
patch.
Tested-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arend van Spriel <arend.vanspriel@broadcom.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Eric Snowberg <eric.snowberg@oracle.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jon Hunter <jonathanh@nvidia.com>
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: YiFei Zhu <zhuyifei1999@gmail.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20181129171230.18699-10-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

5ed1b15f

efi: Prevent GICv3 WARN() by mapping the memreserve table before first use · 9a407ede

由 Ard Biesheuvel 提交于 11月 23, 2018

commit 976b489120cdab2b1b3a41ffa14661db43d58190 upstream

Mapping the MEMRESERVE EFI configuration table from an early initcall
is too late: the GICv3 ITS code that creates persistent reservations
for the boot CPU's LPI tables is invoked from init_IRQ(), which runs
much earlier than the handling of the initcalls. This results in a
WARN() splat because the LPI tables cannot be reserved persistently,
which will result in silent memory corruption after a kexec reboot.

So instead, invoke the initialization performed by the initcall from
efi_mem_reserve_persistent() itself as well, but keep the initcall so
that the init is guaranteed to have been called before SMP boot.
Tested-by: NMarc Zyngier <marc.zyngier@arm.com>
Tested-by: NJan Glauber <jglauber@cavium.com>
Tested-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes: 63eb322d89c8 ("efi: Permit calling efi_mem_reserve_persistent() ...")
Link: http://lkml.kernel.org/r/20181123215132.7951-2-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

9a407ede

efi: Permit calling efi_mem_reserve_persistent() from atomic context · 41a6a528

由 Ard Biesheuvel 提交于 11月 14, 2018

commit 63eb322d89c8505af9b4a3d703e85e42281ebaa0 upstream

Currently, efi_mem_reserve_persistent() may not be called from atomic
context, since both the kmalloc() call and the memremap() call may
sleep.

The kmalloc() call is easy enough to fix, but the memremap() call
needs to be moved into an init hook since we cannot control the
memory allocation behavior of memremap() at the call site.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20181114175544.12860-6-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

41a6a528

efi/arm: libstub: add a root memreserve config table · 15329929

由 Ard Biesheuvel 提交于 9月 21, 2018

commit b844470f22061e8cd646cb355e85d2f518b2c913 upstream

Installing UEFI configuration tables can only be done before calling
ExitBootServices(), so if we want to use the new MEMRESRVE config table
from the kernel proper, we need to install a dummy entry from the stub.
Tested-by: NJeremy Linton <jeremy.linton@arm.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

15329929

efi: add API to reserve memory persistently across kexec reboot · f62da2f7

由 Ard Biesheuvel 提交于 9月 21, 2018

commit a23d3bb05ccbd815c79293d2207fedede0b3515d upstream

Add kernel plumbing to reserve memory regions persistently on a EFI
system by adding entries to the MEMRESERVE linked list.
Tested-by: NJeremy Linton <jeremy.linton@arm.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

f62da2f7

efi: honour memory reservations passed via a linux specific config table · 66e33537

由 Ard Biesheuvel 提交于 9月 21, 2018

commit 71e0940d52e107748b270213a01d3b1546657d74 upstream

In order to allow the OS to reserve memory persistently across a
kexec, introduce a Linux-specific UEFI configuration table that
points to the head of a linked list in memory, allowing each kernel
to add list items describing memory regions that the next kernel
should treat as reserved.

This is useful, e.g., for GICv3 based ARM systems that cannot disable
DMA access to the LPI tables, forcing them to reuse the same memory
region again after a kexec reboot.
Tested-by: NJeremy Linton <jeremy.linton@arm.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

66e33537

hinic: panic when reading back he IRQ affinity hint · 49367293

由 Zou Cao 提交于 7月 12, 2019

It is wrong for using stack variable to store the current code sets an
affinity hint, it will cause a panic or returning corrupt data.

So this patch moves the mask local variable into hinic_rq struct
to avoid this situation.

Backtrace:
Internal error: Oops: 96000007 [#1] SMP
Process irqbalance (pid: 1464, stack limit = 0x000000009bc2bec4)
CPU: 35 PID: 1464 Comm: irqbalance Tainted: G        W
pstate: 00400089 (nzcv daIf +PAN -UAO)
pc : __memcpy+0x44/0x180
lr : irq_affinity_hint_proc_show+0x9c/0x100
sp : ffff00002cb7bb60
x29: ffff00002cb7bb60 x28: 0000ffff9f6d9000
x27: ffff803f27cef9c0 x26: ffff00002cb7be30
x25: 0000000000000400 x24: ffff803f290e7000
x23: 0000000000000000 x22: ffff803f27cef980
x21: ffff000009009000 x20: ffff802fb9f01000
x19: ffff00002cb7bba8 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 0000000000000000 x6 : ffff00002cb7bba8
x5 : ffff802fb9f01000 x4 : 0000000000000008
x3 : ffff802fb9f01194 x2 : 0000000000000078
x1 : ffff000027d2b8b8 x0 : ffff00002cb7bba8
Call trace:
	__memcpy+0x44/0x180
	seq_read+0x1b4/0x45c
	proc_reg_read+0x7c/0xb8
	__vfs_read+0x58/0x190
	vfs_read+0x94/0x154
	ksys_read+0x68/0xd8
	__arm64_sys_read+0x28/0x34
	el0_svc_common+0xe8/0x19c
	el0_svc_handler+0x78/0x94
	el0_svc+0x8/0xc
Code: 36100064 b8404423 b80044c3 36180064 (f8408423)

---[ end trace b2cae62a9c2d153f ]---
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

49367293

ACPI / APD: Add clock frequency for Hisilicon Hip08 SPI controller · 068be897

由 Jay Fang 提交于 12月 03, 2018

commit 56131d6d8638b7cb6feee67a8794b3dfa626396e upstream

The SPI clock frequency of Designware IP for Hisilicon Hip08 is 250M.

The ACPI ID used is "HISI0173".
Signed-off-by: NJay Fang <f.fangjian@huawei.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

068be897

arm/arm64: KVM: Enable 32 bits kvm vcpu events support · 76d210a9

由 Dongjiu Geng 提交于 10月 13, 2018

commit 58bf437ff64eac8aca606e42d7e4623e40b61fa1 upstream

The commit 539aee0e ("KVM: arm64: Share the parts of
get/set events useful to 32bit") shares the get/set events
helper for arm64 and arm32, but forgot to share the cap
extension code.

User space will check whether KVM supports vcpu events by
checking the KVM_CAP_VCPU_EVENTS extension
Acked-by: NJames Morse <james.morse@arm.com>
Reviewed-by : Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NDongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

76d210a9

spi: dw-mmio: add ACPI support · b83dbea9

由 Jay Fang 提交于 12月 03, 2018

commit 32215a6c6beb8dcda4bb0759b04ce3c30927963b upstream

The Hisilicon Hip08 platform, that uses ACPI, has this controller.
Let's add ACPI support for DW SPI MMIO-based host.

The ACPI ID used is "HISI0173" for the Designware SPI controller of
Hisilicon Hip08 platform.
Signed-off-by: NJay Fang <f.fangjian@huawei.com>
Signed-off-by: NMark Brown <broonie@kernel.org>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

b83dbea9

arm/arm64: KVM: Rename function kvm_arch_dev_ioctl_check_extension() · 351b99f3

由 Dongjiu Geng 提交于 10月 13, 2018

commit 375bdd3b5d4f7cf146f0df1488b4671b141dd799 upstream

Rename kvm_arch_dev_ioctl_check_extension() to
kvm_arch_vm_ioctl_check_extension(), because it does
not have any relationship with device.

Renaming this function can make code readable.

Cc: James Morse <james.morse@arm.com>
Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NDongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

351b99f3

ACPI / CPPC: Add a helper to get desired performance · c7cd0fd4

由 Xiongfeng Wang 提交于 2月 17, 2019

commit 1757d05f3112acc5c0cdbcccad3afdee99655bf9 upstream

This patch add a helper to get the value of desired performance
register.
Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
[ rjw: More white space ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

c7cd0fd4

cpufreq / cppc: Work around for Hisilicon CPPC cpufreq · 0e270e23

由 Xiongfeng Wang 提交于 2月 17, 2019

commit 6c8d750f9784cef32a8cffdad74c8a351b4ca3a6 upstream

Hisilicon chips do not support delivered performance counter register
and reference performance counter register. But the platform can
calculate the real performance using its own method. We reuse the
desired performance register to store the real performance calculated by
the platform. After the platform finished the frequency adjust, it gets
the real performance and writes it into desired performance register. Os
can use it to calculate the real frequency.
Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
[ rjw: Drop unnecessary braces ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

0e270e23

irqchip/gic-v3-its: Allow use of LPI tables in reserved memory · 12f5096a

由 Marc Zyngier 提交于 7月 27, 2018

commit 5e2c9f9a627772672accd80fa15359c0de6aa894 upstream

If the LPI tables have been reserved with the EFI reservation
mechanism, we assume that these tables are safe to use even
when we find the redistributors to have LPIs enabled at
boot time, meaning that kexec can now work with GICv3.

You're welcome.
Tested-by: NJeremy Linton <jeremy.linton@arm.com>
Tested-by: NBhupesh Sharma <bhsharma@redhat.com>
Tested-by: NLei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

12f5096a

irqchip/gic-v3-its: Register LPI tables with EFI config table · 776f6fc2

由 Marc Zyngier 提交于 7月 27, 2018

commit 3fb68faee8676900f896d1615442aeca36e5f940 upstream

Upon enabling a redistributor, let's register the allocated tables
with the EFI table that tracks the memory reservations.
Tested-by: NJeremy Linton <jeremy.linton@arm.com>
Tested-by: NBhupesh Sharma <bhsharma@redhat.com>
Tested-by: NLei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

776f6fc2

irqchip/gic-v3-its: Check that all RDs have the same property table · e60da1bc

由 Marc Zyngier 提交于 7月 27, 2018

commit f842ca8e9c8a80d07f5589536311250d7d6018f9 upstream

If booting with LPIs enabled, all the redistributors must have the
exact same property table. No ifs, no buts.
Tested-by: NJeremy Linton <jeremy.linton@arm.com>
Tested-by: NBhupesh Sharma <bhsharma@redhat.com>
Tested-by: NLei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

e60da1bc

irqchip/gic-v3-its: Use pre-programmed redistributor tables with kdump kernels · 50e0649e

由 Marc Zyngier 提交于 6月 26, 2018

commit c6e2ccb66d0c3b4fffc59932585e9f709ad59003 upstream

If using a kdump kernel, and that we cannot disable LPIs to install
our own tables, let's switch to using the already allocated tables.

This means that we'll change some of the initial kernel's memory,
but at least we'll be able to have LPIs in this secondary kernel.
Tested-by: NJeremy Linton <jeremy.linton@arm.com>
Tested-by: NBhupesh Sharma <bhsharma@redhat.com>
Tested-by: NLei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

50e0649e

irqchip/gic-v3-its: Allow use of pre-programmed LPI tables · fc58b2a4

由 Marc Zyngier 提交于 7月 27, 2018

commit c440a9d9d113b9b3cd99bb5096c4aa47d515e463 upstream

In order to cope with kexec and GICv3, let's try and spot when
we're booting with LPIs already enabled, and the tables already
programmed into the redistributors.

This code is currently guarded by a predicate that is always false,
meaning this is not functionnal just yet.
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Tested-by: NJeremy Linton <jeremy.linton@arm.com>
Tested-by: NBhupesh Sharma <bhsharma@redhat.com>
Tested-by: NLei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

fc58b2a4

irqchip/gic-v3-its: Keep track of property table's PA and VA · 04d13e90

由 Marc Zyngier 提交于 7月 27, 2018

commit e1a2e2010ba9d3c765b2e37a7ae8b332564716f1 upstream

We're currently only tracking the page allocated to contain the
property table by its struct page. In the future, it is going to
be convenient to track both PA and VA for that page instead. Let's
do that.
Tested-by: NJeremy Linton <jeremy.linton@arm.com>
Tested-by: NBhupesh Sharma <bhsharma@redhat.com>
Tested-by: NLei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

04d13e90

irqchip/gic-v3-its: Move pending table allocation to init time · 2dd1eb0b

由 Marc Zyngier 提交于 7月 27, 2018

commit 11e37d357f6ba7a9af850a872396082cc0a0001f upstream

Pending tables for the redistributors are currently allocated
one at a time as each CPU boots. This is causing some grief
for Linux/RT (allocation from within a CPU hotplug notifier is
frown upon).

Let's move this allocation to take place at init time, when we
only have a single CPU. It means we're allocating memory for CPUs
that are not online yet, but most system will boot all of their
CPUs anyway, so that's not completely wasted.
Tested-by: NJeremy Linton <jeremy.linton@arm.com>
Tested-by: NBhupesh Sharma <bhsharma@redhat.com>
Tested-by: NLei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

2dd1eb0b

irqchip/gic-v3-its: Split property table clearing from allocation · f817dfef

由 Marc Zyngier 提交于 7月 27, 2018

commit 053be4854f9bcceba99cdfa0c89acc4696852c3f upstream

As we're going to reuse some pre-allocated memory for the property
table, split out the zeroing of that table into a separate function
for later use.
Tested-by: NJeremy Linton <jeremy.linton@arm.com>
Tested-by: NBhupesh Sharma <bhsharma@redhat.com>
Tested-by: NLei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

f817dfef

irqchip/gic-v3-its: Change initialization ordering for LPIs · 8e56d955

由 Marc Zyngier 提交于 7月 27, 2018

commit d38a71c5452529fd3326b0ae488292e5fbd8d2a1 upstream

We currently initialize the LPIs (and the ITS) fairly early, even
before the SMP support and the CPU interface. This is a bit odd
(as LPIs are not exactly crutial for the early boot process),
and is going to cause issues when reorganizing the probing code.

Let's move this initialization later.
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Tested-by: NJeremy Linton <jeremy.linton@arm.com>
Tested-by: NBhupesh Sharma <bhsharma@redhat.com>
Tested-by: NLei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

8e56d955

iommu/arm-smmu-v3: Remove unnecessary wrapper function · 669d3d01

由 Andrew Murray 提交于 10月 10, 2018

commit 5e731073bc0a4a53a213412dbd33982d829560f1 upstream

Simplify the code by removing an unnecessary wrapper function.

This was left behind by commit 2f657add
("iommu/arm-smmu-v3: Specialise CMD_SYNC handling")
Signed-off-by: NAndrew Murray <andrew.murray@arm.com>
Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

669d3d01

iommu/arm-smmu: Support non-strict mode · 815a3acf

由 Robin Murphy 提交于 9月 20, 2018

commit 44f6876a00e83df5fd28681502b19b0f51e4a3c6 upstream

All we need is to wire up .flush_iotlb_all properly and implement the
domain attribute, and iommu-dma and io-pgtable will do the rest for us.
The only real subtlety is documenting the barrier semantics we're
introducing between io-pgtable and the drivers for non-strict flushes.
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

815a3acf

iommu/io-pgtable-arm-v7s: Add support for non-strict mode · 397aa2bf

由 Robin Murphy 提交于 9月 20, 2018

commit b2dfeba654cb08db327d0ed4547b66c2f8fce997 upstream

As for LPAE, it's simply a case of skipping the leaf invalidation for a
regular unmap, and ensuring that the one in split_blk_unmap() is paired
with an explicit sync ASAP rather than relying on one which might only
eventually happen way down the line.
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

397aa2bf

iommu/arm-smmu-v3: Add support for non-strict mode · e50b3155

由 Zhen Lei 提交于 9月 20, 2018

commit 9662b99a19abccb0b7bfc91abb3fec1447c35bf0 upstream

Now that io-pgtable knows how to dodge strict TLB maintenance, all
that's left to do is bridge the gap between the IOMMU core requesting
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE for default domains, and showing the
appropriate IO_PGTABLE_QUIRK_NON_STRICT flag to alloc_io_pgtable_ops().
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
[rm: convert to domain attribute, tweak commit message]
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

e50b3155

iommu/io-pgtable-arm: Add support for non-strict mode · 15c316e7

由 Zhen Lei 提交于 9月 20, 2018

commit b6b65ca20bc93d14319f9b5cf98fd3c19a4244e3 upstream

Non-strict mode is simply a case of skipping 'regular' leaf TLBIs, since
the sync is already factored out into ops->iotlb_sync at the core API
level. Non-leaf invalidations where we change the page table structure
itself still have to be issued synchronously in order to maintain walk
caches correctly.

To save having to reason about it too much, make sure the invalidation
in arm_lpae_split_blk_unmap() just performs its own unconditional sync
to minimise the window in which we're technically violating the break-
before-make requirement on a live mapping. This might work out redundant
with an outer-level sync for strict unmaps, but we'll never be splitting
blocks on a DMA fastpath anyway.
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
[rm: tweak comment, commit message, split_blk_unmap logic and barriers]
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

15c316e7

iommu: Add "iommu.strict" command line option · c4e943b6

由 Zhen Lei 提交于 9月 20, 2018

commit 68a6efe86f6a16e25556a2aff40efad41097b486 upstream

Add a generic command line option to enable lazy unmapping via IOVA
flush queues, which will initally be suuported by iommu-dma. This echoes
the semantics of "intel_iommu=strict" (albeit with the opposite default
value), but in the driver-agnostic fashion of "iommu.passthrough".
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
[rm: move handling out of SMMUv3 driver, clean up documentation]
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
[will: dropped broken printk when parsing command-line option]
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

c4e943b6

iommu/dma: Add support for non-strict mode · c103f920

由 Zhen Lei 提交于 9月 20, 2018

commit 2da274cdf998a1c12afa6b5975db2df1df01edf1 upstream

With the flush queue infrastructure already abstracted into IOVA
domains, hooking it up in iommu-dma is pretty simple. Since there is a
degree of dependency on the IOMMU driver knowing what to do to play
along, we key the whole thing off a domain attribute which will be set
on default DMA ops domains to request non-strict invalidation. That way,
drivers can indicate the appropriate support by acknowledging the
attribute, and we can easily fall back to strict invalidation otherwise.

The flush queue callback needs a handle on the iommu_domain which owns
our cookie, so we have to add a pointer back to that, but neatly, that's
also sufficient to indicate whether we're using a flush queue or not,
and thus which way to release IOVAs. The only slight subtlety is
switching __iommu_dma_unmap() from calling iommu_unmap() to explicit
iommu_unmap_fast()/iommu_tlb_sync() so that we can elide the sync
entirely in non-strict mode.
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
[rm: convert to domain attribute, tweak comments and commit message]
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

c103f920

iommu/arm-smmu-v3: Implement flush_iotlb_all hook · 4ec05fe0

由 Zhen Lei 提交于 9月 20, 2018

commit 07fdef34d2be6811f00c6f9e4e2a1483cf86696c upstream

.flush_iotlb_all is currently stubbed to arm_smmu_iotlb_sync() since the
only time it would ever need to actually do anything is for callers
doing their own explicit batching, e.g.:

	iommu_unmap_fast(domain, ...);
	iommu_unmap_fast(domain, ...);
	iommu_iotlb_flush_all(domain, ...);

where since io-pgtable still issues the TLBI commands implicitly in the
unmap instead of implementing .iotlb_range_add, the "flush" only needs
to ensure completion of those already-in-flight invalidations.

However, we're about to start using it in anger with flush queues, so
let's get a proper implementation wired up.
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
[rm: document why it wasn't a bug]
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

4ec05fe0

iommu/arm-smmu-v3: Avoid back-to-back CMD_SYNC operations · 6b151cdb

由 Zhen Lei 提交于 8月 19, 2018

commit 901510ee32f7190902f6fe4affb463e5d86a804c upstream

Putting adjacent CMD_SYNCs into the command queue is nonsensical, but
can happen when multiple CPUs are inserting commands. Rather than leave
the poor old hardware to chew through these operations, we can instead
drop the subsequent SYNCs and poll for completion of the first. This
has been shown to improve IO performance under pressure, where the
number of SYNC operations reduces by about a third:

	CMD_SYNCs reduced:	19542181
	CMD_SYNCs total:	58098548	(include reduced)
	CMDs total:		116197099	(TLBI:SYNC about 1:1)
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

6b151cdb

iommu/arm-smmu-v3: Fix unexpected CMD_SYNC timeout · afb27ac8

由 Zhen Lei 提交于 8月 19, 2018

commit 0f02477d16980938a84aba8688a4e3a303306116 upstream

The condition break condition of:

	(int)(VAL - sync_idx) >= 0

in the __arm_smmu_sync_poll_msi() polling loop requires that sync_idx
must be increased monotonically according to the sequence of the CMDs in
the cmdq.

However, since the msidata is populated using atomic_inc_return_relaxed()
before taking the command-queue spinlock, then the following scenario
can occur:

CPU0			CPU1
msidata=0
			msidata=1
			insert cmd1
insert cmd0
			smmu execute cmd1
smmu execute cmd0
			poll timeout, because msidata=1 is overridden by
			cmd0, that means VAL=0, sync_idx=1.

This is not a functional problem, since the caller will eventually either
timeout or exit due to another CMD_SYNC, however it's clearly not what
the code is supposed to be doing. Fix it, by incrementing the sequence
count with the command-queue lock held, allowing us to drop the atomic
operations altogether.
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
[will: dropped the specialised cmd building routine for now]
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

afb27ac8

iommu/io-pgtable-arm: Fix race handling in split_blk_unmap() · 4fed8b77

由 Robin Murphy 提交于 9月 06, 2018

commit 85c7a0f1ef624ef58173ef52ea77780257bdfe04 upstream

In removing the pagetable-wide lock, we gained the possibility of the
vanishingly unlikely case where we have a race between two concurrent
unmappers splitting the same block entry. The logic to handle this is
fairly straightforward - whoever loses the race frees their partial
next-level table and instead dereferences the winner's newly-installed
entry in order to fall back to a regular unmap, which intentionally
echoes the pre-existing case of recursively splitting a 1GB block down
to 4KB pages by installing a full table of 2MB blocks first.

Unfortunately, the chump who implemented that logic failed to update the
condition check for that fallback, meaning that if said race occurs at
the last level (where the loser's unmap_idx is valid) then the unmap
won't actually happen. Fix that to properly account for both the race
and recursive cases.

Fixes: 2c3d273e ("iommu/io-pgtable-arm: Support lockless operation")
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
[will: re-jig control flow to avoid duplicate cmpxchg test]
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

4fed8b77

iommu/arm-smmu-v3: Fix a couple of minor comment typos · fcc22750

由 John Garry 提交于 8月 17, 2018

commit 657135f3108122556c3cf60a78c6f0e76aeb60e6 commit

Fix some comment typos spotted.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

fcc22750

09 10月, 2019 4 次提交

sched/psi: Correct overly pessimistic size calculation · a4926952

由 Miles Chen 提交于 9月 12, 2019

commit 4adcdcea717cb2d8436bef00dd689aa5bc76f11b upstream.

When passing a equal or more then 32 bytes long string to psi_write(),
psi_write() copies 31 bytes to its buf and overwrites buf[30]
with '\0'. Which makes the input string 1 byte shorter than
it should be.

Fix it by copying sizeof(buf) bytes when nbytes >= sizeof(buf).

This does not cause problems in normal use case like:
"some 500000 10000000" or "full 500000 10000000" because they
are less than 32 bytes in length.

	/* assuming nbytes == 35 */
	char buf[32];

	buf_size = min(nbytes, (sizeof(buf) - 1)); /* buf_size = 31 */
	if (copy_from_user(buf, user_buf, buf_size))
		return -EFAULT;

	buf[buf_size - 1] = '\0'; /* buf[30] = '\0' */

Before:

 %cd /proc/pressure/
 %echo "123456789|123456789|123456789|1234" > memory
 [   22.473497] nbytes=35,buf_size=31
 [   22.473775] 123456789|123456789|123456789| (print 30 chars)
 %sh: write error: Invalid argument

 %echo "123456789|123456789|123456789|1" > memory
 [   64.916162] nbytes=32,buf_size=31
 [   64.916331] 123456789|123456789|123456789| (print 30 chars)
 %sh: write error: Invalid argument

After:

 %cd /proc/pressure/
 %echo "123456789|123456789|123456789|1234" > memory
 [  254.837863] nbytes=35,buf_size=32
 [  254.838541] 123456789|123456789|123456789|1 (print 31 chars)
 %sh: write error: Invalid argument

 %echo "123456789|123456789|123456789|1" > memory
 [ 9965.714935] nbytes=32,buf_size=32
 [ 9965.715096] 123456789|123456789|123456789|1 (print 31 chars)
 %sh: write error: Invalid argument

Also remove the superfluous parentheses.
Signed-off-by: NMiles Chen <miles.chen@mediatek.com>
Cc: <linux-mediatek@lists.infradead.org>
Cc: <wsd_upstream@mediatek.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190912103452.13281-1-miles.chen@mediatek.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

a4926952

sched/psi: Do not require setsched permission from the trigger creator · 31107f61

由 Suren Baghdasaryan 提交于 7月 29, 2019

commit 04e048cf09d7b5fc995817cdc5ae1acd4482429c upstream.

When a process creates a new trigger by writing into /proc/pressure/*
files, permissions to write such a file should be used to determine whether
the process is allowed to do so or not. Current implementation would also
require such a process to have setsched capability. Setting of psi trigger
thread's scheduling policy is an implementation detail and should not be
exposed to the user level. Remove the permission check by using _nocheck
version of the function.
Suggested-by: NNick Kralevich <nnk@google.com>
Signed-off-by: NSuren Baghdasaryan <surenb@google.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: lizefan@huawei.com
Cc: mingo@redhat.com
Cc: akpm@linux-foundation.org
Cc: kernel-team@android.com
Cc: dennisszhou@gmail.com
Cc: dennis@kernel.org
Cc: hannes@cmpxchg.org
Cc: axboe@kernel.dk
Link: https://lkml.kernel.org/r/20190730013310.162367-1-surenb@google.comSigned-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

31107f61

sched/psi: Reduce psimon FIFO priority · 171362e0

由 Peter Zijlstra 提交于 8月 01, 2019

commit 14f5c7b46a41a595fc61db37f55721714729e59e upstream.

PSI defaults to a FIFO-99 thread, reduce this to FIFO-1.

FIFO-99 is the very highest priority available to SCHED_FIFO and
it not a suitable default; it would indicate the psi work is the
most important work on the machine.

Since Real-Time tasks will have pre-allocated memory and locked it in
place, Real-Time tasks do not care about PSI. All it needs is to be
above OTHER.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Tested-by: NSuren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

171362e0

blk-cgroup: turn on psi memstall stuff · 1417a792

由 Josef Bacik 提交于 7月 09, 2019

commit fd112c74652371a023f85d87b70bee7169e8f4d0 upstream.

With the psi stuff in place we can use the memstall flag to indicate
pressure that happens from throttling.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

1417a792

30 9月, 2019 1 次提交

KVM: coalesced_mmio: add bounds checking · 00bf95fe

由 Matt Delco 提交于 9月 16, 2019

commit b60fe990c6b07ef6d4df67bc0530c7c90a62623a upstream.

The first/last indexes are typically shared with a user app.
The app can change the 'last' index that the kernel uses
to store the next result.  This change sanity checks the index
before using it for writing to a potentially arbitrary address.

This fixes CVE-2019-14821.

Cc: stable@vger.kernel.org
Fixes: 5f94c174 ("KVM: Add coalesced MMIO support (common part)")
Signed-off-by: NMatt Delco <delco@chromium.org>
Signed-off-by: NJim Mattson <jmattson@google.com>
Reported-by: syzbot+983c866c3dd6efa3662a@syzkaller.appspotmail.com
[Use READ_ONCE. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

00bf95fe

29 9月, 2019 1 次提交

sched/fair: Don't assign runtime for throttled cfs_rq · e56822fa

由 Liangyan 提交于 8月 26, 2019

commit 5e2d2cc2588bd3307ce3937acbc2ed03c830a861 upstream.

do_sched_cfs_period_timer() will refill cfs_b runtime and call
distribute_cfs_runtime to unthrottle cfs_rq, sometimes cfs_b->runtime
will allocate all quota to one cfs_rq incorrectly, then other cfs_rqs
attached to this cfs_b can't get runtime and will be throttled.

We find that one throttled cfs_rq has non-negative
cfs_rq->runtime_remaining and cause an unexpetced cast from s64 to u64
in snippet:

  distribute_cfs_runtime() {
    runtime = -cfs_rq->runtime_remaining + 1;
  }

The runtime here will change to a large number and consume all
cfs_b->runtime in this cfs_b period.

According to Ben Segall, the throttled cfs_rq can have
account_cfs_rq_runtime called on it because it is throttled before
idle_balance, and the idle_balance calls update_rq_clock to add time
that is accounted to the task.

This commit prevents cfs_rq to be assgined new runtime if it has been
throttled until that distribute_cfs_runtime is called.
Signed-off-by: NLiangyan <liangyan.peng@linux.alibaba.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NValentin Schneider <valentin.schneider@arm.com>
Reviewed-by: NBen Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: shanpeic@linux.alibaba.com
Cc: stable@vger.kernel.org
Cc: xlpang@linux.alibaba.com
Fixes: d3d9dc33 ("sched: Throttle entities exceeding their allowed bandwidth")
Link: https://lkml.kernel.org/r/20190826121633.6538-1-liangyan.peng@linux.alibaba.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

e56822fa

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功