提交 · ec2e76b4c73362231e7d6903e14fc9817d99f3ae · openanolis / cloud-kernel

02 4月, 2015 2 次提交

perf/x86/intel: Fix Haswell CYCLE_ACTIVITY.* counter constraints · c420f19b

由 Andi Kleen 提交于 3月 09, 2015

Some of the CYCLE_ACTIVITY.* events can only be scheduled on
counter 2.  Due to a typo Haswell matched those with
INTEL_EVENT_CONSTRAINT, which lead to the events never
matching as the comparison does not expect anything
in the umask too. Fix the typo.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1425925222-32361-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

c420f19b

perf/x86/intel: Filter branches for PEBS event · 687805e4

由 Kan Liang 提交于 3月 27, 2015

For supporting Intel LBR branches filtering, Intel LBR sharing logic
mechanism is introduced from commit b36817e8 ("perf/x86: Add Intel
LBR sharing logic"). It modifies __intel_shared_reg_get_constraints() to
config lbr_sel, which is finally used to set LBR_SELECT.

However, the intel_shared_regs_constraints() function is called after
intel_pebs_constraints(). The PEBS event will return immediately after
intel_pebs_constraints(). So it's impossible to filter branches for PEBS
events.

This patch moves intel_shared_regs_constraints() ahead of
intel_pebs_constraints().

We can safely do that because the intel_shared_regs_constraints() function
only returns empty constraint if its rejecting the event, otherwise it
returns NULL such that we continue calling intel_pebs_constraints() and
x86_get_event_constraint().
Signed-off-by: NKan Liang <kan.liang@intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1427467105-9260-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

687805e4

25 3月, 2015 1 次提交

x86/asm/entry: Check for syscall exit work with IRQs disabled · b3494a4a

由 Andy Lutomirski 提交于 3月 23, 2015

We currently have a race: if we're preempted during syscall
exit, we can fail to process syscall return work that is queued
up while we're preempted in ret_from_sys_call after checking
ti.flags.

Fix it by disabling interrupts before checking ti.flags.
Reported-by: NStefan Seyfried <stefan.seyfried@googlemail.com>
Reported-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Acked-by: NDenys Vlasenko <dvlasenk@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Fixes: 96b6352c ("x86_64, entry: Remove the syscall exit audit")
Link: http://lkml.kernel.org/r/189320d42b4d671df78c10555976bb10af1ffc75.1427137498.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

b3494a4a

24 3月, 2015 1 次提交

KVM: x86: call irq notifiers with directed EOI · c806a6ad

由 Radim Krčmář 提交于 3月 18, 2015

kvm_ioapic_update_eoi() wasn't called if directed EOI was enabled.
We need to do that for irq notifiers.  (Like with edge interrupts.)

Fix it by skipping EOI broadcast only.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=82211Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Tested-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c806a6ad

23 3月, 2015 1 次提交

x86/xen: prepare p2m list for memory hotplug · 633d6f17

由 Juergen Gross 提交于 3月 20, 2015

Commit 054954eb ("xen: switch to linear
virtual mapped sparse p2m list") introduced a regression regarding to
memory hotplug for a pv-domain: as the virtual space for the p2m list
is allocated for the to be expected memory size of the domain only,
hotplugged memory above that size will not be usable by the domain.

Correct this by using a configurable size for the p2m list in case of
memory hotplug enabled (default supported memory size is 512 GB for
64 bit domains and 4 GB for 32 bit domains).
Signed-off-by: NJuergen Gross <jgross@suse.com>
Cc: <stable@vger.kernel.org> # 3.19+
Reviewed-by: NDaniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

633d6f17

20 3月, 2015 1 次提交

Revert "x86/PCI: Refine the way to release PCI IRQ resources" · 9e8ce4b9

由 Rafael J. Wysocki 提交于 3月 20, 2015

Commit b4b55cda (Refine the way to release PCI IRQ resources)
introduced a regression in the PCI IRQ resource management by causing
the IRQ resource of a device, established when pci_enabled_device()
is called on a fully disabled device, to be released when the driver
is unbound from the device, regardless of the enable_cnt.

This leads to the situation that an ill-behaved driver can now make a
device unusable to subsequent drivers by an imbalance in their use of
pci_enable/disable_device().  That is a serious problem for secondary
drivers like vfio-pci, which are innocent of the transgressions of
the previous driver.

Since the solution of this problem is not immediate and requires
further discussion, revert commit b4b55cda and the issue it was
supposed to address (a bug related to xen-pciback) will be taken
care of in a different way going forward.
Reported-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

9e8ce4b9

18 3月, 2015 1 次提交

KVM: nVMX: mask unrestricted_guest if disabled on L0 · 0790ec17

由 Radim Krčmář 提交于 3月 17, 2015

If EPT was enabled, unrestricted_guest was allowed in L1 regardless of
L0.  L1 triple faulted when running L2 guest that required emulation.

Another side effect was 'WARN_ON_ONCE(vmx->nested.nested_run_pending)'
in L0's dmesg:
  WARNING: CPU: 0 PID: 0 at arch/x86/kvm/vmx.c:9190 nested_vmx_vmexit+0x96e/0xb00 [kvm_intel] ()

Prevent this scenario by masking SECONDARY_EXEC_UNRESTRICTED_GUEST when
the host doesn't have it enabled.

Fixes: 78051e3b ("KVM: nVMX: Disable unrestricted mode if ept=0")
Cc: stable@vger.kernel.org
Tested-By: NKashyap Chamarthy <kchamart@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0790ec17

16 3月, 2015 1 次提交

Revert "x86/mm/ASLR: Propagate base load address calculation" · 69797daf

由 Borislav Petkov 提交于 3月 16, 2015

This reverts commit:

  f47233c2 ("x86/mm/ASLR: Propagate base load address calculation")

The main reason for the revert is that the new boot flag does not work
at all currently, and in order to make this work, we need non-trivial
changes to the x86 boot code which we didn't manage to get done in
time for merging.

And even if we did, they would've been too risky so instead of
rushing things and break booting 4.1 on boxes left and right, we
will be very strict and conservative and will take our time with
this to fix and test it properly.
Reported-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: H. Peter Anvin <hpa@linux.intel.com
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/20150316100628.GD22995@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>

69797daf

13 3月, 2015 5 次提交

KVM: VMX: Set msr bitmap correctly if vcpu is in guest mode · 670125bd

由 Wincy Van 提交于 3月 04, 2015

In commit 3af18d9c ("KVM: nVMX: Prepare for using hardware MSR bitmap"),
we are setting MSR_BITMAP in prepare_vmcs02 if we should use hardware. This
is not enough since the field will be modified by following vmx_set_efer.

Fix this by setting vmx_msr_bitmap_nested in vmx_set_msr_bitmap if vcpu is
in guest mode.
Signed-off-by: NWincy Van <fanwenyi0529@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

670125bd

x86/fpu: Drop_fpu() should not assume that tsk equals current · f4c36863

由 Oleg Nesterov 提交于 3月 13, 2015

drop_fpu() does clear_used_math() and usually this is correct
because tsk == current.

However switch_fpu_finish()->restore_fpu_checking() is called before
__switch_to() updates the "current_task" variable. If it fails,
we will wrongly clear the PF_USED_MATH flag of the previous task.

So use clear_stopped_child_used_math() instead.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150309171041.GB11388@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

f4c36863

x86/fpu: Avoid math_state_restore() without used_math() in __restore_xstate_sig() · a7c80ebc

由 Oleg Nesterov 提交于 3月 13, 2015

math_state_restore() assumes it is called with irqs disabled,
but this is not true if the caller is __restore_xstate_sig().

This means that if ia32_fxstate == T and __copy_from_user()
fails, __restore_xstate_sig() returns with irqs disabled too.

This triggers:

  BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41
   dump_stack
   ___might_sleep
   ? _raw_spin_unlock_irqrestore
   __might_sleep
   down_read
   ? _raw_spin_unlock_irqrestore
   print_vma_addr
   signal_fault
   sys32_rt_sigreturn

Change __restore_xstate_sig() to call set_used_math()
unconditionally. This avoids enabling and disabling interrupts
in math_state_restore(). If copy_from_user() fails, we can
simply do fpu_finit() by hand.

[ Note: this is only the first step. math_state_restore() should
        not check used_math(), it should set this flag. While
	init_fpu() should simply die. ]
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150307153844.GB25954@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

a7c80ebc

crypto: aesni - fix memory usage in GCM decryption · ccfe8c3f

由 Stephan Mueller 提交于 3月 12, 2015

The kernel crypto API logic requires the caller to provide the
length of (ciphertext || authentication tag) as cryptlen for the
AEAD decryption operation. Thus, the cipher implementation must
calculate the size of the plaintext output itself and cannot simply use
cryptlen.

The RFC4106 GCM decryption operation tries to overwrite cryptlen memory
in req->dst. As the destination buffer for decryption only needs to hold
the plaintext memory but cryptlen references the input buffer holding
(ciphertext || authentication tag), the assumption of the destination
buffer length in RFC4106 GCM operation leads to a too large size. This
patch simply uses the already calculated plaintext size.

In addition, this patch fixes the offset calculation of the AAD buffer
pointer: as mentioned before, cryptlen already includes the size of the
tag. Thus, the tag does not need to be added. With the addition, the AAD
will be written beyond the already allocated buffer.

Note, this fixes a kernel crash that can be triggered from user space
via AF_ALG(aead) -- simply use the libkcapi test application
from [1] and update it to use rfc4106-gcm-aes.

Using [1], the changes were tested using CAVS vectors to demonstrate
that the crypto operation still delivers the right results.

[1] http://www.chronox.de/libkcapi.html

CC: Tadeusz Struk <tadeusz.struk@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: NStephan Mueller <smueller@chronox.de>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

ccfe8c3f

kvm: x86: i8259: return initialized data on invalid-size read · c1a6bff2

由 Petr Matousek 提交于 3月 11, 2015

If data is read from PIC with invalid access size, the return data stays
uninitialized even though success is returned.

Fix this by always initializing the data.
Signed-off-by: NPetr Matousek <pmatouse@redhat.com>
Reported-by: NNadav Amit <nadav.amit@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c1a6bff2

12 3月, 2015 2 次提交

x86/apic/numachip: Fix sibling map with NumaChip · c8a470ca

由 Daniel J Blueman 提交于 3月 12, 2015

On NumaChip systems, the physical processor ID assignment wasn't
accounting for the number of nodes in AMD multi-module
processors, giving an incorrect sibling map:

  $ cd /sys/devices/system/cpu/cpu29/topology
  $ grep . *
  core_id:5
  core_siblings:00000000,ff000000
  core_siblings_list:24-31
  physical_package_id:3
  thread_siblings:00000000,30000000
  thread_siblings_list:28-29

This fixes it:

  $ cd /sys/devices/system/cpu/cpu29/topology
  $ grep . *
  core_id:5
  core_siblings:00000000,ffff0000
  core_siblings_list:16-31
  physical_package_id:1
  thread_siblings:00000000,30000000
  thread_siblings_list:28-29
Signed-off-by: NDaniel J Blueman <daniel@numascale.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Steffen Persvold <sp@numascale.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426135950-10110-1-git-send-email-daniel@numascale.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

c8a470ca

x86/platform, acpi: Bypass legacy PIC and PIT in ACPI hardware reduced mode · 7486341a

由 Li, Aubrey 提交于 3月 11, 2015

On a platform in ACPI Hardware-reduced mode, the legacy PIC and
PIT may not be initialized even though they may be present in
silicon. Touching these legacy components causes unexpected
results on the system.

On the Bay Trail-T(ASUS-T100) platform, touching these legacy
components blocks platform hardware low idle power state(S0ix)
during system suspend. So we should bypass them in ACPI hardware
reduced mode.
Suggested-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NLi Aubrey <aubrey.li@linux.intel.com>
Cc: <alan@linux.intel.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Link: http://lkml.kernel.org/r/54FFF81C.20703@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

7486341a

11 3月, 2015 1 次提交

kvm: move advertising of KVM_CAP_IRQFD to common code · dc9be0fa

由 Paolo Bonzini 提交于 3月 05, 2015

POWER supports irqfds but forgot to advertise them.  Some userspace does
not check for the capability, but others check it---thus they work on
x86 and s390 but not POWER.

To avoid that other architectures in the future make the same mistake, let
common code handle KVM_CAP_IRQFD the same way as KVM_CAP_IRQFD_RESAMPLE.
Reported-and-tested-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Fixes: 297e2105Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

dc9be0fa

10 3月, 2015 1 次提交

x86/asm/entry/32: Fix user_mode() misuses · 394838c9

由 Andy Lutomirski 提交于 3月 09, 2015

The one in do_debug() is probably harmless, but better safe than sorry.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/d67deaa9df5458363623001f252d1aee3215d014.1425948056.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

394838c9

06 3月, 2015 3 次提交

x86/vdso: Fix the build on GCC5 · e8932869

由 Jiri Slaby 提交于 3月 05, 2015

On gcc5 the kernel does not link:

  ld: .eh_frame_hdr table[4] FDE at 0000000000000648 overlaps table[5] FDE at 0000000000000670.

Because prior GCC versions always emitted NOPs on ALIGN directives, but
gcc5 started omitting them.

.LSTARTFDEDLSI1 says:

        /* HACK: The dwarf2 unwind routines will subtract 1 from the
           return address to get an address in the middle of the
           presumed call instruction.  Since we didn't get here via
           a call, we need to include the nop before the real start
           to make up for it.  */
        .long .LSTART_sigreturn-1-.     /* PC-relative start address */

But commit 69d0627a ("x86 vDSO: reorder vdso32 code") from 2.6.25
replaced .org __kernel_vsyscall+32,0x90 by ALIGN right before
__kernel_sigreturn.

Of course, ALIGN need not generate any NOP in there. Esp. gcc5 collapses
vclock_gettime.o and int80.o together with no generated NOPs as "ALIGN".

So fix this by adding to that point at least a single NOP and make the
function ALIGN possibly with more NOPs then.

Kudos for reporting and diagnosing should go to Richard.
Reported-by: NRichard Biener <rguenther@suse.de>
Signed-off-by: NJiri Slaby <jslaby@suse.cz>
Acked-by: NAndy Lutomirski <luto@amacapital.net>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1425543211-12542-1-git-send-email-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>

e8932869

x86/fpu/xsaves: Fix improper uses of __ex_table · 06c8173e

由 Quentin Casasnovas 提交于 3月 05, 2015

Commit:

  f31a9f7c ("x86/xsaves: Use xsaves/xrstors to save and restore xsave area")

introduced alternative instructions for XSAVES/XRSTORS and commit:

  adb9d526 ("x86/xsaves: Add xsaves and xrstors support for booting time")

added support for the XSAVES/XRSTORS instructions at boot time.

Unfortunately both failed to properly protect them against faulting:

The 'xstate_fault' macro will use the closest label named '1'
backward and that ends up in the .altinstr_replacement section
rather than in .text. This means that the kernel will never find
in the __ex_table the .text address where this instruction might
fault, leading to serious problems if userspace manages to
trigger the fault.
Signed-off-by: NQuentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: NJamie Iles <jamie.iles@oracle.com>
[ Improved the changelog, fixed some whitespace noise. ]
Acked-by: NBorislav Petkov <bp@alien8.de>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Cc: Allan Xavier <mr.a.xavier@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: adb9d526 ("x86/xsaves: Add xsaves and xrstors support for booting time")
Fixes: f31a9f7c ("x86/xsaves: Use xsaves/xrstors to save and restore xsave area")
Signed-off-by: NIngo Molnar <mingo@kernel.org>

06c8173e

x86/intel/quark: Select COMMON_CLK · 9ab6eb51

由 Andy Shevchenko 提交于 3月 05, 2015

The commit 8bbc2a13 ("x86/intel/quark: Add Intel Quark
platform support") introduced a minimal support of Intel Quark
SoC. That allows to use core parts of the SoC. However, the SPI,
I2C, and GPIO drivers can't be selected by kernel configuration
because they depend on COMMON_CLK. The patch adds a COMMON_CLK
selection to the platfrom definition to allow user choose the drivers.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: NOng, Boon Leong <boon.leong.ong@intel.com>
Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: Darren Hart <dvhart@linux.intel.com>
Fixes: 8bbc2a13 ("x86/intel/quark: Add Intel Quark platform support")
Link: http://lkml.kernel.org/r/1425569044-2867-1-git-send-email-andriy.shevchenko@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

9ab6eb51

05 3月, 2015 1 次提交

x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization · 956421fb

由 Andy Lutomirski 提交于 3月 05, 2015

'ret_from_fork' checks TIF_IA32 to determine whether 'pt_regs' and
the related state make sense for 'ret_from_sys_call'.  This is
entirely the wrong check.  TS_COMPAT would make a little more
sense, but there's really no point in keeping this optimization
at all.

This fixes a return to the wrong user CS if we came from int
0x80 in a 64-bit task.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/4710be56d76ef994ddf59087aad98c000fbab9a4.1424989793.git.luto@amacapital.net
[ Backported from tip:x86/asm. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

956421fb

04 3月, 2015 1 次提交

x86/PCI/ACPI: Ignore resources consumed by host bridge itself · 63f1789e

由 Jiang Liu 提交于 3月 04, 2015

When parsing resources for PCI host bridge, we should ignore resources
consumed by host bridge itself and only report window resources available
to child PCI busses.

Fixes: 593669c2 (x86/PCI/ACPI: Use common ACPI resource interfaces ...)
Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

63f1789e

03 3月, 2015 1 次提交

KVM: SVM: fix interrupt injection (apic->isr_count always 0) · f563db4b

由 Radim Krčmář 提交于 2月 27, 2015

In commit b4eef9b3, we started to use hwapic_isr_update() != NULL
instead of kvm_apic_vid_enabled(vcpu->kvm). This didn't work because
SVM had it defined and "apicv" path in apic_{set,clear}_isr() does not
change apic->isr_count, because it should always be 1. The initial
value of apic->isr_count was based on kvm_apic_vid_enabled(vcpu->kvm),
which is always 0 for SVM, so KVM could have injected interrupts when it
shouldn't.

Fix it by implicitly setting SVM's hwapic_isr_update to NULL and make the
initial isr_count depend on hwapic_isr_update() for good measure.

Fixes: b4eef9b3 ("kvm: x86: vmx: NULL out hwapic_isr_update() in case of !enable_apicv")
Reported-and-tested-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f563db4b

28 2月, 2015 1 次提交

x86: Init per-cpu shadow copy of CR4 on 32-bit CPUs too · 5b2bdbc8

由 Steven Rostedt 提交于 2月 27, 2015

Commit:

   1e02ce4c ("x86: Store a per-cpu shadow copy of CR4")

added a shadow CR4 such that reads and writes that do not
modify the CR4 execute much faster than always reading the
register itself.

The change modified cpu_init() in common.c, so that the
shadow CR4 gets initialized before anything uses it.

Unfortunately, there's two cpu_init()s in common.c. There's
one for 64-bit and one for 32-bit. The commit only added
the shadow init to the 64-bit path, but the 32-bit path
needs the init too.

Link: http://lkml.kernel.org/r/20150227125208.71c36402@gandalf.local.home Fixes: 1e02ce4c "x86: Store a per-cpu shadow copy of CR4"
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Acked-by: NAndy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20150227145019.2bdd4354@gandalf.local.homeSigned-off-by: NIngo Molnar <mingo@kernel.org>

5b2bdbc8

27 2月, 2015 1 次提交

x86/xen: correct bug in p2m list initialization · b8f05c88

由 Juergen Gross 提交于 2月 27, 2015

Commit 054954eb ("xen: switch to
linear virtual mapped sparse p2m list") introduced an error.

During initialization of the p2m list a p2m identity area mapped by
a complete identity pmd entry has to be split up into smaller chunks
sometimes, if a non-identity pfn is introduced in this area.

If this non-identity pfn is not at index 0 of a p2m page the new
p2m page needed is initialized with wrong identity entries, as the
identity pfns don't start with the value corresponding to index 0,
but with the initial non-identity pfn. This results in weird wrong
mappings.

Correct the wrong initialization by starting with the correct pfn.

Cc: stable@vger.kernel.org # 3.19
Reported-by: NStefan Bader <stefan.bader@canonical.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>
Tested-by: NStefan Bader <stefan.bader@canonical.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

b8f05c88

24 2月, 2015 6 次提交

x86/platform/intel-mid: Fix trivial printk message typo in intel_mid_arch_setup() · 579deee5

由 Yannick Guerrini 提交于 2月 23, 2015

Change 'Uknown' to 'Unknown'
Signed-off-by: NYannick Guerrini <yguerrini@tomshardware.fr>
Cc: trivial@kernel.org
Link: http://lkml.kernel.org/r/1424710358-10140-1-git-send-email-yguerrini@tomshardware.frSigned-off-by: NIngo Molnar <mingo@kernel.org>

579deee5

KVM: emulate: fix CMPXCHG8B on 32-bit hosts · 4ff6f8e6

由 Paolo Bonzini 提交于 2月 12, 2015

This has been broken for a long time: it broke first in 2.6.35, then was
almost fixed in 2.6.36 but this one-liner slipped through the cracks.
The bug shows up as an infinite loop in Windows 7 (and newer) boot on
32-bit hosts without EPT.

Windows uses CMPXCHG8B to write to page tables, which causes a
page fault if running without EPT; the emulator is then called from
kvm_mmu_page_fault.  The loop then happens if the higher 4 bytes are
not 0; the common case for this is that the NX bit (bit 63) is 1.

Fixes: 6550e1f1
Fixes: 16518d5a
Cc: stable@vger.kernel.org   # 2.6.35+
Reported-by: NErik Rull <erik.rull@rdsoftware.de>
Tested-by: NErik Rull <erik.rull@rdsoftware.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4ff6f8e6

KVM: VMX: fix build without CONFIG_SMP · 21bc8dc5

由 Radim Krčmář 提交于 2月 16, 2015

'apic' is not defined if !CONFIG_X86_64 && !CONFIG_X86_LOCAL_APIC.
Posted interrupt makes no sense without CONFIG_SMP, and
CONFIG_X86_LOCAL_APIC will be set with it.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

21bc8dc5

x86/xen: Initialize cr4 shadow for 64-bit PV(H) guests · 5054daa2

由 Boris Ostrovsky 提交于 2月 23, 2015

Commit 1e02ce4c ("x86: Store a per-cpu shadow copy of CR4")
introduced CR4 shadows.

These shadows are initialized in early boot code. The commit missed
initialization for 64-bit PV(H) guests that this patch adds.
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

5054daa2

x86/xen: allow privcmd hypercalls to be preempted · fdfd811d

由 David Vrabel 提交于 2月 19, 2015

Hypercalls submitted by user space tools via the privcmd driver can
take a long time (potentially many 10s of seconds) if the hypercall
has many sub-operations.

A fully preemptible kernel may deschedule such as task in any upcall
called from a hypercall continuation.

However, in a kernel with voluntary or no preemption, hypercall
continuations in Xen allow event handlers to be run but the task
issuing the hypercall will not be descheduled until the hypercall is
complete and the ioctl returns to user space.  These long running
tasks may also trigger the kernel's soft lockup detection.

Add xen_preemptible_hcall_begin() and xen_preemptible_hcall_end() to
bracket hypercalls that may be preempted.  Use these in the privcmd
driver.

When returning from an upcall, call xen_maybe_preempt_hcall() which
adds a schedule point if if the current task was within a preemptible
hypercall.

Since _cond_resched() can move the task to a different CPU, clear and
set xen_in_preemptible_hcall around the call.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

fdfd811d

x86/xen: Make sure X2APIC_ENABLE bit of MSR_IA32_APICBASE is not set · 31795b47

由 Boris Ostrovsky 提交于 2月 11, 2015

Commit d524165c ("x86/apic: Check x2apic early") tests X2APIC_ENABLE
bit of MSR_IA32_APICBASE when CONFIG_X86_X2APIC is off and panics
the kernel when this bit is set.

Xen's PV guests will pass this MSR read to the hypervisor which will
return its version of the MSR, where this bit might be set. Make sure
we clear it before returning MSR value to the caller.
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

31795b47

22 2月, 2015 1 次提交

x86/cpu/intel: Fix trivial typo in intel_tlb_table[] · a927792c

由 Yannick Guerrini 提交于 2月 21, 2015

Change 'ssociative' to 'associative'
Signed-off-by: NYannick Guerrini <yguerrini@tomshardware.fr>
Cc: Borislav Petkov <bp@suse.de>
Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: Chris Bainbridge <chris.bainbridge@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Steven Honeyman <stevenhoneyman@gmail.com>
Cc: trivial@kernel.org
Link: http://lkml.kernel.org/r/1424558510-1420-1-git-send-email-yguerrini@tomshardware.frSigned-off-by: NIngo Molnar <mingo@kernel.org>

a927792c

21 2月, 2015 2 次提交

kprobes/x86: Check for invalid ftrace location in __recover_probed_insn() · 2a6730c8

由 Petr Mladek 提交于 2月 20, 2015

__recover_probed_insn() should always be called from an address
where an instructions starts. The check for ftrace_location()
might help to discover a potential inconsistency.

This patch adds WARN_ON() when the inconsistency is detected.
Also it adds handling of the situation when the original code
can not get recovered.
Suggested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NPetr Mladek <pmladek@suse.cz>
Cc: Ananth NMavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1424441250-27146-3-git-send-email-pmladek@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>

2a6730c8

kprobes/x86: Use 5-byte NOP when the code might be modified by ftrace · 650b7b23

由 Petr Mladek 提交于 2月 20, 2015

can_probe() checks if the given address points to the beginning
of an instruction. It analyzes all the instructions from the
beginning of the function until the given address. The code
might be modified by another Kprobe. In this case, the current
code is read into a buffer, int3 breakpoint is replaced by the
saved opcode in the buffer, and can_probe() analyzes the buffer
instead.

There is a bug that __recover_probed_insn() tries to restore
the original code even for Kprobes using the ftrace framework.
But in this case, the opcode is not stored. See the difference
between arch_prepare_kprobe() and arch_prepare_kprobe_ftrace().
The opcode is stored by arch_copy_kprobe() only from
arch_prepare_kprobe().

This patch makes Kprobe to use the ideal 5-byte NOP when the
code can be modified by ftrace. It is the original instruction,
see ftrace_make_nop() and ftrace_nop_replace().

Note that we always need to use the NOP for ftrace locations.
Kprobes do not block ftrace and the instruction might get
modified at anytime. It might even be in an inconsistent state
because it is modified step by step using the int3 breakpoint.

The patch also fixes indentation of the touched comment.

Note that I found this problem when playing with Kprobes. I did
it on x86_64 with gcc-4.8.3 that supported -mfentry. I modified
samples/kprobes/kprobe_example.c and added offset 5 to put
the probe right after the fentry area:

 static struct kprobe kp = {
 	.symbol_name	= "do_fork",
+	.offset = 5,
 };

Then I was able to load kprobe_example before jprobe_example
but not the other way around:

  $> modprobe jprobe_example
  $> modprobe kprobe_example
  modprobe: ERROR: could not insert 'kprobe_example': Invalid or incomplete multibyte or wide character

It did not make much sense and debugging pointed to the bug
described above.
Signed-off-by: NPetr Mladek <pmladek@suse.cz>
Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ananth NMavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1424441250-27146-2-git-send-email-pmladek@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>

650b7b23

20 2月, 2015 2 次提交

x86/mm/ASLR: Avoid PAGE_SIZE redefinition for UML subarch · 570e1aa8

由 Jiri Kosina 提交于 2月 20, 2015

Commit f47233c2 ("x86/mm/ASLR: Propagate base load address
calculation") causes PAGE_SIZE redefinition warnings for UML
subarch  builds. This is caused by added includes that were
leftovers from previous  patch versions are are not actually
needed (especially page_types.h  inlcude in module.c). Drop
those stray includes.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1502201017240.28769@pobox.suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>

570e1aa8

x86: pte_protnone() and pmd_protnone() must check entry is not present · e3a1f6ca

由 David Vrabel 提交于 2月 19, 2015

Since _PAGE_PROTNONE aliases _PAGE_GLOBAL it is only valid if
_PAGE_PRESENT is clear.  Make pte_protnone() and pmd_protnone() check
for this.

This fixes a 64-bit Xen PV guest regression introduced by 8a0516ed
("mm: convert p[te|md]_numa users to p[te|md]_protnone_numa").  Any
userspace process would endlessly fault.

In a 64-bit PV guest, userspace page table entries have _PAGE_GLOBAL set
by the hypervisor.  This meant that any fault on a present userspace
entry (e.g., a write to a read-only mapping) would be misinterpreted as
a NUMA hinting fault and the fault would not be correctly handled,
resulting in the access endlessly faulting.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Acked-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e3a1f6ca

19 2月, 2015 4 次提交

x86/microcode/intel: Handle truncated microcode images more robustly · 35a9ff4e

由 Quentin Casasnovas 提交于 2月 03, 2015

We do not check the input data bounds containing the microcode before
copying a struct microcode_intel_header from it. A specially crafted
microcode could cause the kernel to read invalid memory and lead to a
denial-of-service.
Signed-off-by: NQuentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1422964824-22056-3-git-send-email-quentin.casasnovas@oracle.com
[ Made error message differ from the next one and flipped comparison. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

35a9ff4e

x86/microcode/intel: Guard against stack overflow in the loader · f84598bd

由 Quentin Casasnovas 提交于 2月 03, 2015

mc_saved_tmp is a static array allocated on the stack, we need to make
sure mc_saved_count stays within its bounds, otherwise we're overflowing
the stack in _save_mc(). A specially crafted microcode header could lead
to a kernel crash or potentially kernel execution.
Signed-off-by: NQuentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1422964824-22056-1-git-send-email-quentin.casasnovas@oracle.comSigned-off-by: NBorislav Petkov <bp@suse.de>

f84598bd

x86, mm/ASLR: Fix stack randomization on 64-bit systems · 4e7c22d4

由 Hector Marco-Gisbert 提交于 2月 14, 2015

The issue is that the stack for processes is not properly randomized on
64 bit architectures due to an integer overflow.

The affected function is randomize_stack_top() in file
"fs/binfmt_elf.c":

  static unsigned long randomize_stack_top(unsigned long stack_top)
  {
           unsigned int random_variable = 0;

           if ((current->flags & PF_RANDOMIZE) &&
                   !(current->personality & ADDR_NO_RANDOMIZE)) {
                   random_variable = get_random_int() & STACK_RND_MASK;
                   random_variable <<= PAGE_SHIFT;
           }
           return PAGE_ALIGN(stack_top) + random_variable;
           return PAGE_ALIGN(stack_top) - random_variable;
  }

Note that, it declares the "random_variable" variable as "unsigned int".
Since the result of the shifting operation between STACK_RND_MASK (which
is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64):

	  random_variable <<= PAGE_SHIFT;

then the two leftmost bits are dropped when storing the result in the
"random_variable". This variable shall be at least 34 bits long to hold
the (22+12) result.

These two dropped bits have an impact on the entropy of process stack.
Concretely, the total stack entropy is reduced by four: from 2^28 to
2^30 (One fourth of expected entropy).

This patch restores back the entropy by correcting the types involved
in the operations in the functions randomize_stack_top() and
stack_maxrandom_size().

The successful fix can be tested with:

  $ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done
  7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0                          [stack]
  7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0                          [stack]
  7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0                          [stack]
  7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0                          [stack]
  ...

Once corrected, the leading bytes should be between 7ffc and 7fff,
rather than always being 7fff.
Signed-off-by: NHector Marco-Gisbert <hecmargi@upv.es>
Signed-off-by: NIsmael Ripoll <iripoll@upv.es>
[ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ]
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Fixes: CVE-2015-1593
Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.netSigned-off-by: NBorislav Petkov <bp@suse.de>

4e7c22d4

x86/mm/init: Fix incorrect page size in init_memory_mapping() printks · f15e0518

由 Dave Hansen 提交于 2月 10, 2015

With 32-bit non-PAE kernels, we have 2 page sizes available
(at most): 4k and 4M.

Enabling PAE replaces that 4M size with a 2M one (which 64-bit
systems use too).

But, when booting a 32-bit non-PAE kernel, in one of our
early-boot printouts, we say:

  init_memory_mapping: [mem 0x00000000-0x000fffff]
   [mem 0x00000000-0x000fffff] page 4k
  init_memory_mapping: [mem 0x37000000-0x373fffff]
   [mem 0x37000000-0x373fffff] page 2M
  init_memory_mapping: [mem 0x00100000-0x36ffffff]
   [mem 0x00100000-0x003fffff] page 4k
   [mem 0x00400000-0x36ffffff] page 2M
  init_memory_mapping: [mem 0x37400000-0x377fdfff]
   [mem 0x37400000-0x377fdfff] page 4k

Which is obviously wrong.  There is no 2M page available.  This
is probably because of a badly-named variable: in the map_range
code: PG_LEVEL_2M.

Instead of renaming all the PG_LEVEL_2M's.  This patch just
fixes the printout:

  init_memory_mapping: [mem 0x00000000-0x000fffff]
   [mem 0x00000000-0x000fffff] page 4k
  init_memory_mapping: [mem 0x37000000-0x373fffff]
   [mem 0x37000000-0x373fffff] page 4M
  init_memory_mapping: [mem 0x00100000-0x36ffffff]
   [mem 0x00100000-0x003fffff] page 4k
   [mem 0x00400000-0x36ffffff] page 4M
  init_memory_mapping: [mem 0x37400000-0x377fdfff]
   [mem 0x37400000-0x377fdfff] page 4k
  BRK [0x03206000, 0x03206fff] PGTABLE
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20150210212030.665EC267@viggo.jf.intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>

f15e0518

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功