提交 · fe9af81e524e8a86bdd59c0cc0d9e2b0ccaf840f · openanolis / cloud-kernel

20 7月, 2018 11 次提交

x86/tsc: Redefine notsc to behave as tsc=unstable · fe9af81e

由 Pavel Tatashin 提交于 7月 19, 2018

Currently, the notsc kernel parameter disables the use of the TSC by
sched_clock(). However, this parameter does not prevent the kernel from
accessing tsc in other places.

The only rationale to boot with notsc is to avoid timing discrepancies on
multi-socket systems where TSC are not properly synchronized, and thus
exclude TSC from being used for time keeping. But that prevents using TSC
as sched_clock() as well, which is not necessary as the core sched_clock()
implementation can handle non synchronized TSC based sched clocks just
fine.

However, there is another method to solve the above problem: booting with
tsc=unstable parameter. This parameter allows sched_clock() to use TSC and
just excludes it from timekeeping.

So there is no real reason to keep notsc, but for compatibility reasons the
parameter has to stay. Make it behave like 'tsc=unstable' instead.

[ tglx: Massaged changelog ]
Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: peterz@infradead.org
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: pbonzini@redhat.com
Link: https://lkml.kernel.org/r/20180719205545.16512-12-pasha.tatashin@oracle.com

fe9af81e

x86/CPU: Call detect_nopl() only on the BSP · 9b3661cd

由 Borislav Petkov 提交于 7月 19, 2018

Make it use the setup_* variants and have it be called only on the BSP and
drop the call in generic_identify() - X86_FEATURE_NOPL will be replicated
to the APs through the forced caps. Helps to keep the mess at a manageable
level.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: peterz@infradead.org
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: pbonzini@redhat.com
Link: https://lkml.kernel.org/r/20180719205545.16512-11-pasha.tatashin@oracle.com

9b3661cd

x86/jump_label: Initialize static branching early · 8990cac6

由 Pavel Tatashin 提交于 7月 19, 2018

Static branching is useful to runtime patch branches that are used in hot
path, but are infrequently changed.

The x86 clock framework is one example that uses static branches to setup
the best clock during boot and never changes it again.

It is desired to enable the TSC based sched clock early to allow fine
grained boot time analysis early on. That requires the static branching
functionality to be functional early as well.

Static branching requires patching nop instructions, thus,
arch_init_ideal_nops() must be called prior to jump_label_init().

Do all the necessary steps to call arch_init_ideal_nops() right after
early_cpu_init(), which also allows to insert a call to jump_label_init()
right after that. jump_label_init() will be called again from the generic
init code, but the code is protected against reinitialization already.

[ tglx: Massaged changelog ]
Suggested-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: pbonzini@redhat.com
Link: https://lkml.kernel.org/r/20180719205545.16512-10-pasha.tatashin@oracle.com

8990cac6

x86/alternatives, jumplabel: Use text_poke_early() before mm_init() · 6fffacb3

由 Pavel Tatashin 提交于 7月 19, 2018

It supposed to be safe to modify static branches after jump_label_init().
But, because static key modifying code eventually calls text_poke() it can
end up accessing a struct page which has not been initialized yet.

Here is how to quickly reproduce the problem. Insert code like this
into init/main.c:

| +static DEFINE_STATIC_KEY_FALSE(__test);
| asmlinkage __visible void __init start_kernel(void)
| {
|        char *command_line;
|@@ -587,6 +609,10 @@ asmlinkage __visible void __init start_kernel(void)
|        vfs_caches_init_early();
|        sort_main_extable();
|        trap_init();
|+       {
|+       static_branch_enable(&__test);
|+       WARN_ON(!static_branch_likely(&__test));
|+       }
|        mm_init();

The following warnings show-up:
WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:701 text_poke+0x20d/0x230
RIP: 0010:text_poke+0x20d/0x230
Call Trace:
 ? text_poke_bp+0x50/0xda
 ? arch_jump_label_transform+0x89/0xe0
 ? __jump_label_update+0x78/0xb0
 ? static_key_enable_cpuslocked+0x4d/0x80
 ? static_key_enable+0x11/0x20
 ? start_kernel+0x23e/0x4c8
 ? secondary_startup_64+0xa5/0xb0

---[ end trace abdc99c031b8a90a ]---

If the code above is moved after mm_init(), no warning is shown, as struct
pages are initialized during handover from memblock.

Use text_poke_early() in static branching until early boot IRQs are enabled
and from there switch to text_poke. Also, ensure text_poke() is never
invoked when unitialized memory access may happen by using adding a
!after_bootmem assertion.
Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: peterz@infradead.org
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: pbonzini@redhat.com
Link: https://lkml.kernel.org/r/20180719205545.16512-9-pasha.tatashin@oracle.com

6fffacb3

x86/kvmclock: Switch kvmclock data to a PER_CPU variable · 95a3d445

由 Thomas Gleixner 提交于 7月 19, 2018

The previous removal of the memblock dependency from kvmclock introduced a
static data array sized 64bytes * CONFIG_NR_CPUS. That's wasteful on large
systems when kvmclock is not used.

Replace it with:

 - A static page sized array of pvclock data. It's page sized because the
   pvclock data of the boot cpu is mapped into the VDSO so otherwise random
   other data would be exposed to the vDSO

 - A PER_CPU variable of pvclock data pointers. This is used to access the
   pcvlock data storage on each CPU.

The setup is done in two stages:

 - Early boot stores the pointer to the static page for the boot CPU in
   the per cpu data.

 - In the preparatory stage of CPU hotplug assign either an element of
   the static array (when the CPU number is in that range) or allocate
   memory and initialize the per cpu pointer.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: peterz@infradead.org
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Link: https://lkml.kernel.org/r/20180719205545.16512-8-pasha.tatashin@oracle.com

95a3d445

x86/kvmclock: Move kvmclock vsyscall param and init to kvmclock · e499a9b6

由 Thomas Gleixner 提交于 7月 19, 2018

There is no point to have this in the kvm code itself and call it from
there. This can be called from an initcall and the parameter is cleared
when the hypervisor is not KVM.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: peterz@infradead.org
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Link: https://lkml.kernel.org/r/20180719205545.16512-7-pasha.tatashin@oracle.com

e499a9b6

x86/kvmclock: Mark variables __initdata and __ro_after_init · 42f8df93

由 Thomas Gleixner 提交于 7月 19, 2018

The kvmclock parameter is init data and the other variables are not
modified after init.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: peterz@infradead.org
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Link: https://lkml.kernel.org/r/20180719205545.16512-6-pasha.tatashin@oracle.com

42f8df93

x86/kvmclock: Cleanup the code · 146c394d

由 Thomas Gleixner 提交于 7月 19, 2018

- Cleanup the mrs write for wall clock. The type casts to (int) are sloppy
  because the wrmsr parameters are u32 and aside of that wrmsrl() already
  provides the high/low split for free.

- Remove the pointless get_cpu()/put_cpu() dance from various
  functions. Either they are called during early init where CPU is
  guaranteed to be 0 or they are already called from non preemptible
  context where smp_processor_id() can be used safely

- Simplify the convoluted check for kvmclock in the init function.

- Mark the parameter parsing function __init. No point in keeping it
  around.

- Convert to pr_info()
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: peterz@infradead.org
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Link: https://lkml.kernel.org/r/20180719205545.16512-5-pasha.tatashin@oracle.com

146c394d

x86/kvmclock: Decrapify kvm_register_clock() · 7a5ddc8f

由 Thomas Gleixner 提交于 7月 19, 2018

The return value is pointless because the wrmsr cannot fail if
KVM_FEATURE_CLOCKSOURCE or KVM_FEATURE_CLOCKSOURCE2 are set.

kvm_register_clock() is only called locally so wants to be static.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: peterz@infradead.org
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Link: https://lkml.kernel.org/r/20180719205545.16512-4-pasha.tatashin@oracle.com

7a5ddc8f

x86/kvmclock: Remove page size requirement from wall_clock · 7ef363a3

由 Thomas Gleixner 提交于 7月 19, 2018

There is no requirement for wall_clock data to be page aligned or page
sized.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: peterz@infradead.org
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Link: https://lkml.kernel.org/r/20180719205545.16512-3-pasha.tatashin@oracle.com

7ef363a3

x86/kvmclock: Remove memblock dependency · 368a540e

由 Pavel Tatashin 提交于 7月 19, 2018

KVM clock is initialized later compared to other hypervisor clocks because
it has a dependency on the memblock allocator.

Bring it in line with other hypervisors by using memory from the BSS
instead of allocating it.

The benefits:

  - Remove ifdef from common code
  - Earlier availability of the clock
  - Remove dependency on memblock, and reduce code

The downside:

  - Static allocation of the per cpu data structures sized NR_CPUS * 64byte
    Will be addressed in follow up patches.

[ tglx: Split out from larger series ]
Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: peterz@infradead.org
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Link: https://lkml.kernel.org/r/20180719205545.16512-2-pasha.tatashin@oracle.com

368a540e

18 7月, 2018 2 次提交

kvmclock: fix TSC calibration for nested guests · e10f7805

由 Peng Hao 提交于 7月 14, 2018

Inside a nested guest, access to hardware can be slow enough that
tsc_read_refs always return ULLONG_MAX, causing tsc_refine_calibration_work
to be called periodically and the nested guest to spend a lot of time
reading the ACPI timer.

However, if the TSC frequency is available from the pvclock page,
we can just set X86_FEATURE_TSC_KNOWN_FREQ and avoid the recalibration.
'refine' operation.
Suggested-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NPeng Hao <peng.hao2@zte.com.cn>
[Commit message rewritten. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e10f7805

KVM: VMX: Mark VMXArea with revision_id of physical CPU even when eVMCS enabled · 2307af1c

由 Liran Alon 提交于 6月 29, 2018

When eVMCS is enabled, all VMCS allocated to be used by KVM are marked
with revision_id of KVM_EVMCS_VERSION instead of revision_id reported
by MSR_IA32_VMX_BASIC.

However, even though not explictly documented by TLFS, VMXArea passed
as VMXON argument should still be marked with revision_id reported by
physical CPU.

This issue was found by the following setup:
* L0 = KVM which expose eVMCS to it's L1 guest.
* L1 = KVM which consume eVMCS reported by L0.
This setup caused the following to occur:
1) L1 execute hardware_enable().
2) hardware_enable() calls kvm_cpu_vmxon() to execute VMXON.
3) L0 intercept L1 VMXON and execute handle_vmon() which notes
vmxarea->revision_id != VMCS12_REVISION and therefore fails with
nested_vmx_failInvalid() which sets RFLAGS.CF.
4) L1 kvm_cpu_vmxon() don't check RFLAGS.CF for failure and therefore
hardware_enable() continues as usual.
5) L1 hardware_enable() then calls ept_sync_global() which executes
INVEPT.
6) L0 intercept INVEPT and execute handle_invept() which notes
!vmx->nested.vmxon and thus raise a #UD to L1.
7) Raised #UD caused L1 to panic.
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 773e8a04Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2307af1c

15 7月, 2018 6 次提交

x86/kvmclock: set pvti_cpu0_va after enabling kvmclock · 94ffba48

由 Radim Krčmář 提交于 7月 15, 2018

pvti_cpu0_va is the address of shared kvmclock data structure.

pvti_cpu0_va is currently kept unset (1) on 32 bit systems, (2) when
kvmclock vsyscall is disabled, and (3) if kvmclock is not stable.
This poses a problem, because kvm_ptp needs pvti_cpu0_va, but (1) can
work on 32 bit, (2) has little relation to the vsyscall, and (3) does
not need stable kvmclock (although kvmclock won't be used for system
clock if it's not stable, so kvm_ptp is pointless in that case).

Expose pvti_cpu0_va whenever kvmclock is enabled to allow all users to
work with it.

This fixes a regression found on Gentoo: https://bugs.gentoo.org/658544.

Fixes: 9f08890a ("x86/pvclock: add setter for pvclock_pvti_cpu0_va")
Cc: stable@vger.kernel.org
Reported-by: NAndreas Steinmetz <ast@domdv.de>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

94ffba48

x86/kvm/Kconfig: Ensure CRYPTO_DEV_CCP_DD state at minimum matches KVM_AMD · d30f370d

由 Janakarajan Natarajan 提交于 6月 27, 2018

Prevent a config where KVM_AMD=y and CRYPTO_DEV_CCP_DD=m thereby ensuring
that AMD Secure Processor device driver will be built-in when KVM_AMD is
also built-in.

v1->v2:
* Removed usage of 'imply' Kconfig option.
* Change patch commit message.

Fixes: 505c9e94 ("KVM: x86: prefer "depends on" to "select" for SEV")

Cc: <stable@vger.kernel.org> # 4.16.x
Signed-off-by: NJanakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Reviewed-by: NBrijesh Singh <brijesh.singh@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d30f370d

kvm: nVMX: Restore exit qual for VM-entry failure due to MSR loading · 0b88abdc

由 Jim Mattson 提交于 5月 30, 2018

This exit qualification was inadvertently dropped when the two
VM-entry failure blocks were coalesced.

Fixes: e79f245d ("X86/KVM: Properly update 'tsc_offset' to represent the running guest")
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0b88abdc

x86/kvm/vmx: don't read current->thread.{fs,gs}base of legacy tasks · b062b794

由 Vitaly Kuznetsov 提交于 7月 11, 2018

When we switched from doing rdmsr() to reading FS/GS base values from
current->thread we completely forgot about legacy 32-bit userspaces which
we still support in KVM (why?). task->thread.{fsbase,gsbase} are only
synced for 64-bit processes, calling save_fsgs_for_kvm() and using
its result from current is illegal for legacy processes.

There's no ARCH_SET_FS/GS prctls for legacy applications. Base MSRs are,
however, not always equal to zero. Intel's manual says (3.4.4 Segment
Loading Instructions in IA-32e Mode):

"In order to set up compatibility mode for an application, segment-load
instructions (MOV to Sreg, POP Sreg) work normally in 64-bit mode. An
entry is read from the system descriptor table (GDT or LDT) and is loaded
in the hidden portion of the segment register.
...
The hidden descriptor register fields for FS.base and GS.base are
physically mapped to MSRs in order to load all address bits supported by
a 64-bit implementation.
"

The issue was found by strace test suite where 32-bit ioctl_kvm_run test
started segfaulting.
Reported-by: NDmitry V. Levin <ldv@altlinux.org>
Bisected-by: NMasatake YAMATO <yamato@redhat.com>
Fixes: 42b933b5 ("x86/kvm/vmx: read MSR_{FS,KERNEL_GS}_BASE from current->thread")
Cc: stable@vger.kernel.org
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b062b794

KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR · cd283252

由 Paolo Bonzini 提交于 6月 25, 2018

This lets userspace read the MSR_IA32_ARCH_CAPABILITIES and check that all
requested features are available on the host.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cd283252

x86/purgatory: add missing FORCE to Makefile target · fa8cbda8

由 Philipp Rudo 提交于 7月 13, 2018

- Build the kernel without the fix
- Add some flag to the purgatories KBUILD_CFLAGS,I used
  -fno-asynchronous-unwind-tables
- Re-build the kernel

When you look at makes output you see that sha256.o is not re-build in the
last step.  Also readelf -S still shows the .eh_frame section for
sha256.o.

With the fix sha256.o is rebuilt in the last step.

Without FORCE make does not detect changes only made to the command line
options.  So object files might not be re-built even when they should be.
Fix this by adding FORCE where it is missing.

Link: http://lkml.kernel.org/r/20180704110044.29279-2-prudo@linux.ibm.com
Fixes: df6f2801 ("kernel/kexec_file.c: move purgatories sha256 to common code")
Signed-off-by: NPhilipp Rudo <prudo@linux.ibm.com>
Acked-by: NDave Young <dyoung@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>	[4.17+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fa8cbda8

13 7月, 2018 1 次提交

xen: setup pv irq ops vector earlier · 0ce0bba4

由 Juergen Gross 提交于 7月 12, 2018

Setting pv_irq_ops for Xen PV domains should be done as early as
possible in order to support e.g. very early printk() usage.

The same applies to xen_vcpu_info_reset(0), as it is needed for the
pv irq ops.

Move the call of xen_setup_machphys_mapping() after initializing the
pv functions as it contains a WARN_ON(), too.

Remove the no longer necessary conditional in xen_init_irq_ops()
from PVH V1 times to make clear this is a PV only function.

Cc: <stable@vger.kernel.org> # 4.14
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>

0ce0bba4

12 7月, 2018 5 次提交

ARM: dts: am3517.dtsi: Disable reference to OMAP3 OTG controller · 92384741

由 Adam Ford 提交于 7月 11, 2018

The AM3517 has a different OTG controller location than the OMAP3,
which is included from omap3.dtsi.  This results in a hwmod error.
Since the AM3517 has a different OTG controller address, this patch
disabes one that is isn't available.
Signed-off-by: NAdam Ford <aford173@gmail.com>
Signed-off-by: NTony Lindgren <tony@atomide.com>

92384741

ARM: DRA7/OMAP5: Enable ACTLR[0] (Enable invalidates of BTB) for secondary cores · 2f8b5b21

由 Nishanth Menon 提交于 7月 10, 2018

Call secure services to enable ACTLR[0] (Enable invalidates of BTB with
ICIALLU) when branch hardening is enabled for kernel.

On GP devices OMAP5/DRA7, there is no possibility to update secure
side since "secure world" is ROM and there are no override mechanisms
possible. On HS devices, appropriate PPA should do the workarounds as
well.

However, the configuration is only done for secondary core, since it is
expected that firmware/bootloader will have enabled the required
configuration for the primary boot core (note: bootloaders typically
will NOT enable secondary processors, since it has no need to do so).
Signed-off-by: NNishanth Menon <nm@ti.com>
Signed-off-by: NTony Lindgren <tony@atomide.com>

2f8b5b21

xen: remove global bit from __default_kernel_pte_mask for pv guests · e69b5d30

由 Juergen Gross 提交于 7月 02, 2018

When removing the global bit from __supported_pte_mask do the same for
__default_kernel_pte_mask in order to avoid the WARN_ONCE() in
check_pgprot() when setting a kernel pte before having called
init_mem_mapping().

Cc: <stable@vger.kernel.org> # 4.17
Reported-by: NMichael Young <m.a.young@durham.ac.uk>
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>

e69b5d30

ARM: 8780/1: ftrace: Only set kernel memory back to read-only after boot · b4c7e2bd

由 Steven Rostedt (VMware) 提交于 7月 10, 2018

Dynamic ftrace requires modifying the code segments that are usually
set to read-only. To do this, a per arch function is called both before
and after the ftrace modifications are performed. The "before" function
will set kernel code text to read-write to allow for ftrace to make the
modifications, and the "after" function will set the kernel code text
back to "read-only" to keep the kernel code text protected.

The issue happens when dynamic ftrace is tested at boot up. The test is
done before the kernel code text has been set to read-only. But the
"before" and "after" calls are still performed. The "after" call will
change the kernel code text to read-only prematurely, and other boot
code that expects this code to be read-write will fail.

The solution is to add a variable that is set when the kernel code text
is expected to be converted to read-only, and make the ftrace "before"
and "after" calls do nothing if that variable is not yet set. This is
similar to the x86 solution from commit 16239630 ("ftrace, x86:
make kernel text writable only for conversions").

Link: http://lkml.kernel.org/r/20180620212906.24b7b66e@vmware.local.homeReported-by: NStefan Agner <stefan@agner.ch>
Tested-by: NStefan Agner <stefan@agner.ch>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>

b4c7e2bd

arm64: neon: Fix function may_use_simd() return error status · 2fd8eb4a

由 Yandong Zhao 提交于 7月 11, 2018

It does not matter if the caller of may_use_simd() migrates to
another cpu after the call, but it is still important that the
kernel_neon_busy percpu instance that is read matches the cpu the
task is running on at the time of the read.

This means that raw_cpu_read() is not sufficient.  kernel_neon_busy
may appear true if the caller migrates during the execution of
raw_cpu_read() and the next task to be scheduled in on the initial
cpu calls kernel_neon_begin().

This patch replaces raw_cpu_read() with this_cpu_read() to protect
against this race.

Cc: <stable@vger.kernel.org>
Fixes: cb84d11e ("arm64: neon: Remove support for nested or hardirq kernel-mode NEON")
Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NYandong Zhao <yandong77520@gmail.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

2fd8eb4a

11 7月, 2018 2 次提交

efi/x86: Fix mixed mode reboot loop by removing pointless call to PciIo->Attributes() · e2967018

由 Ard Biesheuvel 提交于 7月 11, 2018

Hans de Goede reported that his mixed EFI mode Bay Trail tablet
would not boot at all any more, but enter a reboot loop without
any logs printed by the kernel.

Unbreak 64-bit Linux/x86 on 32-bit UEFI:

When it was first introduced, the EFI stub code that copies the
contents of PCI option ROMs originally only intended to do so if
the EFI_PCI_IO_ATTRIBUTE_EMBEDDED_ROM attribute was *not* set.

The reason was that the UEFI spec permits PCI option ROM images
to be provided by the platform directly, rather than via the ROM
BAR, and in this case, the OS can only access them at runtime if
they are preserved at boot time by copying them from the areas
described by PciIo->RomImage and PciIo->RomSize.

However, it implemented this check erroneously, as can be seen in
commit:

  dd5fc854 ("EFI: Stash ROMs if they're not in the PCI BAR")

which introduced:

    if (!attributes & EFI_PCI_IO_ATTRIBUTE_EMBEDDED_ROM)
            continue;

and given that the numeric value of EFI_PCI_IO_ATTRIBUTE_EMBEDDED_ROM
is 0x4000, this condition never becomes true, and so the option ROMs
were copied unconditionally.

This was spotted and 'fixed' by commit:

  886d751a ("x86, efi: correct precedence of operators in setup_efi_pci")

but inadvertently inverted the logic at the same time, defeating
the purpose of the code, since it now only preserves option ROM
images that can be read from the ROM BAR as well.

Unsurprisingly, this broke some systems, and so the check was removed
entirely in the following commit:

  73970188 ("x86, efi: remove attribute check from setup_efi_pci")

It is debatable whether this check should have been included in the
first place, since the option ROM image provided to the UEFI driver by
the firmware may be different from the one that is actually present in
the card's flash ROM, and so whatever PciIo->RomImage points at should
be preferred regardless of whether the attribute is set.

As this was the only use of the attributes field, we can remove
the call to PciIo->Attributes() entirely, which is especially
nice because its prototype involves uint64_t type by-value
arguments which the EFI mixed mode has trouble dealing with.

Any mixed mode system with PCI is likely to be affected.
Tested-by: NWilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
Tested-by: NHans de Goede <hdegoede@redhat.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180711090235.9327-2-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

e2967018

ARM: 8775/1: NOMMU: Use instr_sync instead of plain isb in common code · cea39477

由 Vladimir Murzin 提交于 6月 18, 2018

Greg reported that commit 3c241210 ("ARM: 8756/1: NOMMU: Postpone
MPU activation till __after_proc_init") is causing breakage for the
old Versatile platform in no-MMU mode (with out-of-tree patches):

  AS      arch/arm/kernel/head-nommu.o
arch/arm/kernel/head-nommu.S: Assembler messages:
arch/arm/kernel/head-nommu.S:180: Error: selected processor does not support `isb' in ARM mode
scripts/Makefile.build:417: recipe for target 'arch/arm/kernel/head-nommu.o' failed
make[2]: *** [arch/arm/kernel/head-nommu.o] Error 1
Makefile:1034: recipe for target 'arch/arm/kernel' failed
make[1]: *** [arch/arm/kernel] Error 2

Since the code is common for all NOMMU builds usage of the isb was a
bad idea (please, note that isb also used in MPU related code which is
fine because MPU has dependency on CPU_V7/CPU_V7M), instead use more
robust instr_sync assembler macro.

Fixes: 3c241210 ("ARM: 8756/1: NOMMU: Postpone MPU activation till __after_proc_init")
Reported-by: NGreg Ungerer <gerg@kernel.org>
Tested-by: NGreg Ungerer <gerg@kernel.org>
Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>

cea39477

10 7月, 2018 1 次提交

Revert "arm64: Use aarch64elf and aarch64elfb emulation mode variants" · 96f95a17

由 Laura Abbott 提交于 7月 09, 2018

This reverts commit 38fc4248.

Distributions such as Fedora and Debian do not package the ELF linker
scripts with their toolchains, resulting in kernel build failures such
as:

  |   CHK     include/generated/compile.h
  |   LD [M]  arch/arm64/crypto/sha512-ce.o
  | aarch64-linux-gnu-ld: cannot open linker script file ldscripts/aarch64elf.xr: No such file or directory
  | make[1]: *** [scripts/Makefile.build:530: arch/arm64/crypto/sha512-ce.o] Error 1
  | make: *** [Makefile:1029: arch/arm64/crypto] Error 2

Revert back to the linux targets for now, adding a comment to the Makefile
so we don't accidentally break this in the future.

Cc: Paul Kocialkowski <contact@paulk.fr>
Cc: <stable@vger.kernel.org>
Fixes: 38fc4248 ("arm64: Use aarch64elf and aarch64elfb emulation mode variants")
Tested-by: NKevin Hilman <khilman@baylibre.com>
Signed-off-by: NLaura Abbott <labbott@redhat.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

96f95a17

08 7月, 2018 1 次提交

x86/mtrr: Don't copy out-of-bounds data in mtrr_write · 15279df6

由 Jann Horn 提交于 7月 06, 2018

Don't access the provided buffer out of bounds - this can cause a kernel
out-of-bounds read when invoked through sys_splice() or other things that
use kernel_write()/__kernel_write().

Fixes: 7f8ec5a4 ("x86/mtrr: Convert to use strncpy_from_user() helper")
Signed-off-by: NJann Horn <jannh@google.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180706215003.156702-1-jannh@google.com

15279df6

07 7月, 2018 1 次提交

ARM: pxa: irq: fix handling of ICMR registers in suspend/resume · 0c1049dc

由 Daniel Mack 提交于 7月 06, 2018

PXA3xx platforms have 56 interrupts that are stored in two ICMR
registers. The code in pxa_irq_suspend() and pxa_irq_resume() however
does a simple division by 32 which only leads to one register being
saved at suspend and restored at resume time. The NAND interrupt
setting, for instance, is lost.

Fix this by using DIV_ROUND_UP() instead.
Signed-off-by: NDaniel Mack <daniel@zonque.org>
Signed-off-by: NRobert Jarzmik <robert.jarzmik@free.fr>

0c1049dc

06 7月, 2018 4 次提交

ARM: dts: armada-38x: use the new thermal binding · 568cc2f0

由 Baruch Siach 提交于 7月 03, 2018

Commit 2f28e4c2 (thermal: armada: Clarify control registers
accesses) introduced the new thermal binding. The new binding extends
the second registers field size to 8. Switch to the new binding to fix
thermal reading values. Without this change the fix for errata #132698
introduced in commit 8c0b888f (thermal: armada: Change sensors trim
default value) has no effect.

Cc: stable@vger.kernel.org # v4.16+
Reviewed-by: NMiquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: NBaruch Siach <baruch@tkos.co.il>
Signed-off-by: NGregory CLEMENT <gregory.clement@bootlin.com>

568cc2f0

x86/hyper-v: Fix the circular dependency in IPI enlightenment · 1268ed0c

由 K. Y. Srinivasan 提交于 7月 03, 2018

The IPI hypercalls depend on being able to map the Linux notion of CPU ID
to the hypervisor's notion of the CPU ID. The array hv_vp_index[] provides
this mapping. Code for populating this array depends on the IPI functionality.
Break this circular dependency.

[ tglx: Use a proper define instead of '-1' with a u32 variable as pointed
  	out by Vitaly ]

Fixes: 68bb7bfb ("X86/Hyper-V: Enable IPI enlightenments")
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NMichael Kelley <mikelley@microsoft.com>
Cc: gregkh@linuxfoundation.org
Cc: devel@linuxdriverproject.org
Cc: olaf@aepfle.de
Cc: apw@canonical.com
Cc: jasowang@redhat.com
Cc: hpa@zytor.com
Cc: sthemmin@microsoft.com
Cc: Michael.H.Kelley@microsoft.com
Cc: vkuznets@redhat.com
Link: https://lkml.kernel.org/r/20180703230155.15160-1-kys@linuxonhyperv.com

1268ed0c

MIPS: Fix ioremap() RAM check · 523402fa

由 Paul Burton 提交于 7月 05, 2018

We currently attempt to check whether a physical address range provided
to __ioremap() may be in use by the page allocator by examining the
value of PageReserved for each page in the region - lowmem pages not
marked reserved are presumed to be in use by the page allocator, and
requests to ioremap them fail.

The way we check this has been broken since commit 92923ca3 ("mm:
meminit: only set page reserved in the memblock region"), because
memblock will typically not have any knowledge of non-RAM pages and
therefore those pages will not have the PageReserved flag set. Thus when
we attempt to ioremap a region outside of RAM we incorrectly fail
believing that the region is RAM that may be in use.

In most cases ioremap() on MIPS will take a fast-path to use the
unmapped kseg1 or xkphys virtual address spaces and never hit this path,
so the only way to hit it is for a MIPS32 system to attempt to ioremap()
an address range in lowmem with flags other than _CACHE_UNCACHED.
Perhaps the most straightforward way to do this is using
ioremap_uncached_accelerated(), which is how the problem was discovered.

Fix this by making use of walk_system_ram_range() to test the address
range provided to __ioremap() against only RAM pages, rather than all
lowmem pages. This means that if we have a lowmem I/O region, which is
very common for MIPS systems, we're free to ioremap() address ranges
within it. A nice bonus is that the test is no longer limited to lowmem.

The approach here matches the way x86 performed the same test after
commit c81c8a1e ("x86, ioremap: Speed up check for RAM pages") until
x86 moved towards a slightly more complicated check using walk_mem_res()
for unrelated reasons with commit 0e4c12b4 ("x86/mm, resource: Use
PAGE_KERNEL protection for ioremap of memory pages").
Signed-off-by: NPaul Burton <paul.burton@mips.com>
Reported-by: NSerge Semin <fancer.lancer@gmail.com>
Tested-by: NSerge Semin <fancer.lancer@gmail.com>
Fixes: 92923ca3 ("mm: meminit: only set page reserved in the memblock region")
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v4.2+
Patchwork: https://patchwork.linux-mips.org/patch/19786/

523402fa

arm64: remove no-op -p linker flag · 1a381d4a

由 Greg Hackmann 提交于 6月 27, 2018

Linking the ARM64 defconfig kernel with LLVM lld fails with the error:

  ld.lld: error: unknown argument: -p
  Makefile:1015: recipe for target 'vmlinux' failed

Without this flag, the ARM64 defconfig kernel successfully links with
lld and boots on Dragonboard 410c.

After digging through binutils source and changelogs, it turns out that
-p is only relevant to ancient binutils installations targeting 32-bit
ARM.  binutils accepts -p for AArch64 too, but it's always been
undocumented and silently ignored.  A comment in
ld/emultempl/aarch64elf.em explains that it's "Only here for backwards
compatibility".

Since this flag is a no-op on ARM64, we can safely drop it.
Acked-by: NWill Deacon <will.deacon@arm.com>
Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
Signed-off-by: NGreg Hackmann <ghackmann@google.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

1a381d4a

05 7月, 2018 6 次提交

arm64: add endianness option to LDFLAGS instead of LD · 2893af07

由 Masahiro Yamada 提交于 7月 03, 2018

With the recent syntax extension, Kconfig is now able to evaluate the
compiler / toolchain capability.

However, accumulating flags to 'LD' is not compatible with the way
it works; 'LD' must be passed to Kconfig to call $(ld-option,...)
from Kconfig files.  If you tweak 'LD' in arch Makefile depending on
CONFIG_CPU_BIG_ENDIAN, this would end up with circular dependency
between Makefile and Kconfig.
Acked-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

2893af07

RISC-V: Fix PTRACE_SETREGSET bug. · 1db9b809

由 Jim Wilson 提交于 6月 11, 2018

In riscv_gpr_set, pass regs instead of &regs to user_regset_copyin to fix
gdb segfault.
Signed-off-by: NJim Wilson <jimw@sifive.com>
Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>

1db9b809

RISC-V: Don't include irq-riscv-intc.h · 86065448

由 Palmer Dabbelt 提交于 6月 22, 2018

This file has never existed in the upstream kernel, but it's guarded by
an #ifdef that's also never existed in the upstream kernel. As a part
of our interrupt controller refactoring this header is no longer
necessary, but this reference managed to sneak in anyway.
Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>

86065448

riscv: remove unnecessary of_platform_populate call · f67f10b8

由 Rob Herring 提交于 6月 19, 2018

The DT core will call of_platform_default_populate, so it is not
necessary for arch specific code to call it unless there are custom
match entries, auxdata or parent device. Neither of those apply here, so
remove the call.

Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: linux-riscv@lists.infradead.org
Signed-off-by: NRob Herring <robh@kernel.org>
Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>

f67f10b8

RISC-V: fix R_RISCV_ADD32/R_RISCV_SUB32 relocations · 781c8fe2

由 Andreas Schwab 提交于 6月 12, 2018

The R_RISCV_ADD32/R_RISCV_SUB32 relocations should add/subtract the
address of the symbol (without overflow check), not its contents.
Signed-off-by: NAndreas Schwab <schwab@suse.de>
Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>

781c8fe2

RISC-V: Change variable type for 32-bit compatible · 7df85002

由 Zong Li 提交于 6月 25, 2018

Signed-off-by: NZong Li <zong@andestech.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>

7df85002

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功