提交 · c14fa0b63b5b4234667c03fdc3314c0881caa514 · openeuler / raspberrypi-kernel

18 11月, 2012 7 次提交

x86, mm: Find early page table buffer together · c14fa0b6

由 Yinghai Lu 提交于 11月 16, 2012

We should not do that in every calling of init_memory_mapping.

At the same time need to move down early_memtest, and could remove after_bootmem
checking.

-v2: fix one early_memtest with 32bit by passing max_pfn_mapped instead.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-8-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

c14fa0b6

x86, mm: Change find_early_table_space() paramters · 84f1ae30

由 Yinghai Lu 提交于 11月 16, 2012

call split_mem_range inside the function.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-7-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

84f1ae30

x86, mm: Revert back good_end setting for 64bit · 28b6ff66

由 Yinghai Lu 提交于 11月 16, 2012

After

| commit 8548c84d
| Author: Takashi Iwai <tiwai@suse.de>
| Date:   Sun Oct 23 23:19:12 2011 +0200
|
|    x86: Fix S4 regression
|
|    Commit 4b239f45 ("x86-64, mm: Put early page table high") causes a S4
|    regression since 2.6.39, namely the machine reboots occasionally at S4
|    resume.  It doesn't happen always, overall rate is about 1/20.  But,
|    like other bugs, once when this happens, it continues to happen.
|
|    This patch fixes the problem by essentially reverting the memory
|    assignment in the older way.

Have some page table around 512M again, that will prevent kdump to find 512M
under 768M.

We need revert that reverting, so we could put page table high again for 64bit.

Takashi agreed that S4 regression could be something else.

	https://lkml.org/lkml/2012/6/15/182Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-6-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

28b6ff66

x86, mm: Move init_memory_mapping calling out of setup.c · 22ddfcaa

由 Yinghai Lu 提交于 11月 16, 2012

Now init_memory_mapping is called two times, later will be called for every
ram ranges.

Could put all related init_mem calling together and out of setup.c.

Actually, it reverts commit 1bbbbe77
x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
will address that later with complete solution include handling hole under 4g.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-5-git-send-email-yinghai@kernel.orgReviewed-by: NPekka Enberg <penberg@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

22ddfcaa

x86, mm: Move down find_early_table_space() · 2086fe11

由 Yinghai Lu 提交于 11月 16, 2012

It will need to call split_mem_range().
Move it down after that to avoid extra declaration.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-4-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

2086fe11

x86, mm: Split out split_mem_range from init_memory_mapping · 4e33e065

由 Yinghai Lu 提交于 11月 16, 2012

So make init_memory_mapping smaller and readable.

-v2: use 0 instead of nr_range as input parameter found by Yasuaki Ishimatsu.
Suggested-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-3-git-send-email-yinghai@kernel.orgReviewed-by: NPekka Enberg <penberg@kernel.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

4e33e065

x86, mm: Add global page_size_mask and probe one time only · fa62aafe

由 Yinghai Lu 提交于 11月 16, 2012

Now we pass around use_gbpages and use_pse for calculating page table size,
Later we will need to call init_memory_mapping for every ram range one by one,
that mean those calculation will be done several times.

Those information are the same for all ram range and could be stored in
page_size_mask and could be probed it one time only.

Move that probing code out of init_memory_mapping into separated function
probe_page_size_mask(), and call it before all init_memory_mapping.
Suggested-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-2-git-send-email-yinghai@kernel.orgReviewed-by: NPekka Enberg <penberg@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

fa62aafe

17 11月, 2012 1 次提交

KVM: x86: Fix invalid secondary exec controls in vmx_cpuid_update() · 29282fde

由 Takashi Iwai 提交于 11月 09, 2012

The commit [ad756a16: KVM: VMX: Implement PCID/INVPCID for guests with
EPT] introduced the unconditional access to SECONDARY_VM_EXEC_CONTROL,
and this triggers kernel warnings like below on old CPUs:

    vmwrite error: reg 401e value a0568000 (err 12)
    Pid: 13649, comm: qemu-kvm Not tainted 3.7.0-rc4-test2+ #154
    Call Trace:
     [<ffffffffa0558d86>] vmwrite_error+0x27/0x29 [kvm_intel]
     [<ffffffffa054e8cb>] vmcs_writel+0x1b/0x20 [kvm_intel]
     [<ffffffffa054f114>] vmx_cpuid_update+0x74/0x170 [kvm_intel]
     [<ffffffffa03629b6>] kvm_vcpu_ioctl_set_cpuid2+0x76/0x90 [kvm]
     [<ffffffffa0341c67>] kvm_arch_vcpu_ioctl+0xc37/0xed0 [kvm]
     [<ffffffff81143f7c>] ? __vunmap+0x9c/0x110
     [<ffffffffa0551489>] ? vmx_vcpu_load+0x39/0x1a0 [kvm_intel]
     [<ffffffffa0340ee2>] ? kvm_arch_vcpu_load+0x52/0x1a0 [kvm]
     [<ffffffffa032dcd4>] ? vcpu_load+0x74/0xd0 [kvm]
     [<ffffffffa032deb0>] kvm_vcpu_ioctl+0x110/0x5e0 [kvm]
     [<ffffffffa032e93d>] ? kvm_dev_ioctl+0x4d/0x4a0 [kvm]
     [<ffffffff8117dc6f>] do_vfs_ioctl+0x8f/0x530
     [<ffffffff81139d76>] ? remove_vma+0x56/0x60
     [<ffffffff8113b708>] ? do_munmap+0x328/0x400
     [<ffffffff81187c8c>] ? fget_light+0x4c/0x100
     [<ffffffff8117e1a1>] sys_ioctl+0x91/0xb0
     [<ffffffff815a942d>] system_call_fastpath+0x1a/0x1f

This patch adds a check for the availability of secondary exec
control to avoid these warnings.

Cc: <stable@vger.kernel.org> [v3.6+]
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

29282fde

13 11月, 2012 1 次提交

KVM: x86: invalid opcode oops on SET_SREGS with OSXSAVE bit set (CVE-2012-4461) · 6d1068b3

由 Petr Matousek 提交于 11月 06, 2012

On hosts without the XSAVE support unprivileged local user can trigger
oops similar to the one below by setting X86_CR4_OSXSAVE bit in guest
cr4 register using KVM_SET_SREGS ioctl and later issuing KVM_RUN
ioctl.

invalid opcode: 0000 [#2] SMP
Modules linked in: tun ip6table_filter ip6_tables ebtable_nat ebtables
...
Pid: 24935, comm: zoog_kvm_monito Tainted: G      D      3.2.0-3-686-pae
EIP: 0060:[<f8b9550c>] EFLAGS: 00210246 CPU: 0
EIP is at kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm]
EAX: 00000001 EBX: 000f387e ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: ef5a0060 ESP: d7c63e70
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process zoog_kvm_monito (pid: 24935, ti=d7c62000 task=ed84a0c0
task.ti=d7c62000)
Stack:
 00000001 f70a1200 f8b940a9 ef5a0060 00000000 00200202 f8769009 00000000
 ef5a0060 000f387e eda5c020 8722f9c8 00015bae 00000000 ed84a0c0 ed84a0c0
 c12bf02d 0000ae80 ef7f8740 fffffffb f359b740 ef5a0060 f8b85dc1 0000ae80
Call Trace:
 [<f8b940a9>] ? kvm_arch_vcpu_ioctl_set_sregs+0x2fe/0x308 [kvm]
...
 [<c12bfb44>] ? syscall_call+0x7/0xb
Code: 89 e8 e8 14 ee ff ff ba 00 00 04 00 89 e8 e8 98 48 ff ff 85 c0 74
1e 83 7d 48 00 75 18 8b 85 08 07 00 00 31 c9 8b 95 0c 07 00 00 <0f> 01
d1 c7 45 48 01 00 00 00 c7 45 1c 01 00 00 00 0f ae f0 89
EIP: [<f8b9550c>] kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm] SS:ESP
0068:d7c63e70

QEMU first retrieves the supported features via KVM_GET_SUPPORTED_CPUID
and then sets them later. So guest's X86_FEATURE_XSAVE should be masked
out on hosts without X86_FEATURE_XSAVE, making kvm_set_cr4 with
X86_CR4_OSXSAVE fail. Userspaces that allow specifying guest cpuid with
X86_FEATURE_XSAVE even on hosts that do not support it, might be
susceptible to this attack from inside the guest as well.

Allow setting X86_CR4_OSXSAVE bit only if host has XSAVE support.
Signed-off-by: NPetr Matousek <pmatouse@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

6d1068b3

04 11月, 2012 1 次提交

xen/hypercall: fix hypercall fallback code for very old hypervisors · cf47a83f

由 Jan Beulich 提交于 10月 19, 2012

While copying the argument structures in HYPERVISOR_event_channel_op()
and HYPERVISOR_physdev_op() into the local variable is sufficiently
safe even if the actual structure is smaller than the container one,
copying back eventual output values the same way isn't: This may
collide with on-stack variables (particularly "rc") which may change
between the first and second memcpy() (i.e. the second memcpy() could
discard that change).

Move the fallback code into out-of-line functions, and handle all of
the operations known by this old a hypervisor individually: Some don't
require copying back anything at all, and for the rest use the
individual argument structures' sizes rather than the container's.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJan Beulich <jbeulich@suse.com>
[v2: Reduce #define/#undef usage in HYPERVISOR_physdev_op_compat().]
[v3: Fix compile errors when modules use said hypercalls]
[v4: Add xen_ prefix to the HYPERCALL_..]
[v5: Alter the name and only EXPORT_SYMBOL_GPL one of them]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

cf47a83f

01 11月, 2012 2 次提交

KVM: x86: fix vcpu->mmio_fragments overflow · 87da7e66

由 Xiao Guangrong 提交于 10月 24, 2012

After commit b3356bf0 (KVM: emulator: optimize "rep ins" handling),
the pieces of io data can be collected and write them to the guest memory
or MMIO together

Unfortunately, kvm splits the mmio access into 8 bytes and store them to
vcpu->mmio_fragments. If the guest uses "rep ins" to move large data, it
will cause vcpu->mmio_fragments overflow

The bug can be exposed by isapc (-M isapc):

[23154.818733] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[ ......]
[23154.858083] Call Trace:
[23154.859874]  [<ffffffffa04f0e17>] kvm_get_cr8+0x1d/0x28 [kvm]
[23154.861677]  [<ffffffffa04fa6d4>] kvm_arch_vcpu_ioctl_run+0xcda/0xe45 [kvm]
[23154.863604]  [<ffffffffa04f5a1a>] ? kvm_arch_vcpu_load+0x17b/0x180 [kvm]

Actually, we can use one mmio_fragment to store a large mmio access then
split it when we pass the mmio-exit-info to userspace. After that, we only
need two entries to store mmio info for the cross-mmio pages access
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

87da7e66

xen/mmu: Use Xen specific TLB flush instead of the generic one. · 95a7d768

由 Konrad Rzeszutek Wilk 提交于 10月 31, 2012

As Mukesh explained it, the MMUEXT_TLB_FLUSH_ALL allows the
hypervisor to do a TLB flush on all active vCPUs. If instead
we were using the generic one (which ends up being xen_flush_tlb)
we end up making the MMUEXT_TLB_FLUSH_LOCAL hypercall. But
before we make that hypercall the kernel will IPI all of the
vCPUs (even those that were asleep from the hypervisor
perspective). The end result is that we needlessly wake them
up and do a TLB flush when we can just let the hypervisor
do it correctly.

This patch gives around 50% speed improvement when migrating
idle guest's from one host to another.

Oracle-bug: 14630170

CC: stable@vger.kernel.org
Tested-by: NJingjie Jiang <jingjie.jiang@oracle.com>
Suggested-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

95a7d768

30 10月, 2012 1 次提交

x86: remove obsolete comment from asm/xen/hypervisor.h · b6514633

由 Olaf Hering 提交于 10月 25, 2012

Signed-off-by: NOlaf Hering <olaf@aepfle.de>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

b6514633

26 10月, 2012 2 次提交

x86, mm: Undo incorrect revert in arch/x86/mm/init.c · f82f64dd

由 Yinghai Lu 提交于 10月 25, 2012

Commit

844ab6f9 x86, mm: Find_early_table_space based on ranges that are actually being mapped

added back some lines back wrongly that has been removed in commit

7b16bbf9 Revert "x86/mm: Fix the size calculation of mapping tables"

remove them again.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/CAE9FiQW_vuaYQbmagVnxT2DGsYc=9tNeAbdBq53sYkitPOwxSQ@mail.gmail.comAcked-by: NJacob Shin <jacob.shin@amd.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

f82f64dd

x86: efi: Turn off efi_enabled after setup on mixed fw/kernel · 5189c2a7

由 Olof Johansson 提交于 10月 24, 2012

When 32-bit EFI is used with 64-bit kernel (or vice versa), turn off
efi_enabled once setup is done. Beyond setup, it is normally used to
determine if runtime services are available and we will have none.

This will resolve issues stemming from efivars modprobe panicking on a
32/64-bit setup, as well as some reboot issues on similar setups.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=45991Reported-by: NMarko Kohtala <marko.kohtala@gmail.com>
Reported-by: NMaxim Kammerer <mk@dee.su>
Signed-off-by: NOlof Johansson <olof@lixom.net>
Acked-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: stable@kernel.org # 3.4 - 3.6
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

5189c2a7

25 10月, 2012 3 次提交

x86, mm: Find_early_table_space based on ranges that are actually being mapped · 844ab6f9

由 Jacob Shin 提交于 10月 24, 2012

Current logic finds enough space for direct mapping page tables from 0
to end. Instead, we only need to find enough space to cover mr[0].start
to mr[nr_range].end -- the range that is actually being mapped by
init_memory_mapping()

This is needed after 1bbbbe77, to address
the panic reported here:

  https://lkml.org/lkml/2012/10/20/160
  https://lkml.org/lkml/2012/10/21/157Signed-off-by: NJacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/20121024195311.GB11779@jshin-ToonieTested-by: NTom Rini <trini@ti.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

844ab6f9

x86, mm: Use memblock memory loop instead of e820_RAM · 1f2ff682

由 Yinghai Lu 提交于 10月 22, 2012

We need to handle E820_RAM and E820_RESERVED_KERNEL at the same time.

Also memblock has page aligned range for ram, so we could avoid mapping
partial pages.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/CAE9FiQVZirvaBMFYRfXMmWEcHbKSicQEHz4VAwUv0xFCk51ZNw@mail.gmail.comAcked-by: NJacob Shin <jacob.shin@amd.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>

1f2ff682

x86, mm: Trim memory in memblock to be page aligned · 6ede1fd3

由 Yinghai Lu 提交于 10月 22, 2012

We will not map partial pages, so need to make sure memblock
allocation will not allocate those bytes out.

Also we will use for_each_mem_pfn_range() to loop to map memory
range to keep them consistent.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/CAE9FiQVZirvaBMFYRfXMmWEcHbKSicQEHz4VAwUv0xFCk51ZNw@mail.gmail.comAcked-by: NJacob Shin <jacob.shin@amd.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>

6ede1fd3

24 10月, 2012 13 次提交

x86/irq/ioapic: Check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt · 94777fc5

由 Dimitri Sivanich 提交于 10月 16, 2012

Posting this patch to fix an issue concerning sparse irq's that
I raised a while back.  There was discussion about adding
refcounting to sparse irqs (to fix other potential race
conditions), but that does not appear to have been addressed
yet.  This covers the only issue of this type that I've
encountered in this area.

A NULL pointer dereference can occur in
smp_irq_move_cleanup_interrupt() if we haven't yet setup the
irq_cfg pointer in the irq_desc.irq_data.chip_data.

In create_irq_nr() there is a window where we have set
vector_irq in __assign_irq_vector(), but not yet called
irq_set_chip_data() to set the irq_cfg pointer.

Should an IRQ_MOVE_CLEANUP_VECTOR hit the cpu in question during
this time, smp_irq_move_cleanup_interrupt() will attempt to
process the aforementioned irq, but panic when accessing
irq_cfg.

Only continue processing the irq if irq_cfg is non-NULL.
Signed-off-by: NDimitri Sivanich <sivanich@sgi.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Alexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/20121016125021.GA22935@sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

94777fc5

perf/x86: Remove unused variable in nhmex_rbox_alter_er() · 64dfab8e

由 Wei Yongjun 提交于 10月 22, 2012

The variable port is initialized but never used
otherwise, so remove the unused variable.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Cc: a.p.zijlstra@chello.nl
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Link: http://lkml.kernel.org/r/CAPgLHd8NZkYSkZm22FpZxiEh6HcA0q-V%3D29vdnheiDhgrJZ%2Byw@mail.gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

64dfab8e

x86/efi: Fix oops caused by incorrect set_memory_uc() usage · 3e8fa263

由 Matt Fleming 提交于 10月 19, 2012

Calling __pa() with an ioremap'd address is invalid. If we
encounter an efi_memory_desc_t without EFI_MEMORY_WB set in
->attribute we currently call set_memory_uc(), which in turn
calls __pa() on a potentially ioremap'd address.

On CONFIG_X86_32 this results in the following oops:

  BUG: unable to handle kernel paging request at f7f22280
  IP: [<c10257b9>] reserve_ram_pages_type+0x89/0x210
  *pdpt = 0000000001978001 *pde = 0000000001ffb067 *pte = 0000000000000000
  Oops: 0000 [#1] PREEMPT SMP
  Modules linked in:

  Pid: 0, comm: swapper Not tainted 3.0.0-acpi-efi-0805 #3
   EIP: 0060:[<c10257b9>] EFLAGS: 00010202 CPU: 0
   EIP is at reserve_ram_pages_type+0x89/0x210
   EAX: 0070e280 EBX: 38714000 ECX: f7814000 EDX: 00000000
   ESI: 00000000 EDI: 38715000 EBP: c189fef0 ESP: c189fea8
   DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
  Process swapper (pid: 0, ti=c189e000 task=c18bbe60 task.ti=c189e000)
  Stack:
   80000200 ff108000 00000000 c189ff00 00038714 00000000 00000000 c189fed0
   c104f8ca 00038714 00000000 00038715 00000000 00000000 00038715 00000000
   00000010 38715000 c189ff48 c1025aff 38715000 00000000 00000010 00000000
  Call Trace:
   [<c104f8ca>] ? page_is_ram+0x1a/0x40
   [<c1025aff>] reserve_memtype+0xdf/0x2f0
   [<c1024dc9>] set_memory_uc+0x49/0xa0
   [<c19334d0>] efi_enter_virtual_mode+0x1c2/0x3aa
   [<c19216d4>] start_kernel+0x291/0x2f2
   [<c19211c7>] ? loglevel+0x1b/0x1b
   [<c19210bf>] i386_start_kernel+0xbf/0xc8

The only time we can call set_memory_uc() for a memory region is
when it is part of the direct kernel mapping. For the case where
we ioremap a memory region we must leave it alone.

This patch reimplements the fix from e8c71062 ("x86, efi:
Calling __pa() with an ioremap()ed address is invalid") which
was reverted in e1ad783b because it caused a regression on
some MacBooks (they hung at boot). The regression was caused
because the commit only marked EFI_RUNTIME_SERVICES_DATA as
E820_RESERVED_EFI, when it should have marked all regions that
have the EFI_MEMORY_RUNTIME attribute.

Despite first impressions, it's not possible to use
ioremap_cache() to map all cached memory regions on
CONFIG_X86_64 because of the way that the memory map might be
configured as detailed in the following bug report,

	https://bugzilla.redhat.com/show_bug.cgi?id=748516

e.g. some of the EFI memory regions *need* to be mapped as part
of the direct kernel mapping.
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Huang Ying <huang.ying.caritas@gmail.com>
Cc: Keith Packard <keithp@keithp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1350649546-23541-1-git-send-email-matt@console-pimps.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

3e8fa263

perf/x86: Enable overflow on Intel KNC with a custom knc_pmu_handle_irq() · e4074b30

由 Vince Weaver 提交于 10月 17, 2012

Although based on the Intel P6 design, the interrupt mechnanism
for KNC more closely resembles the Intel architectural
perfmon one.

We can't just re-use that code though, because KNC has different
MSR numbers for the status and ack registers.

In this case we just cut-and paste from perf_event_intel.c
with some minor changes, as it looks like it would not be
worth the trouble to change that code to be MSR-configurable.
Signed-off-by: NVince Weaver <vincent.weaver@maine.edu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: eranian@gmail.com
Cc: Meadows Lawrence F <lawrence.f.meadows@intel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210171304410.23243@vincent-weaver-1.um.maine.edu
[ Small stylistic edits. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

e4074b30

perf/x86: Remove cpuc->enable check on Intl KNC event enable/disable · 7d011962

由 Vince Weaver 提交于 10月 17, 2012

x86_pmu.enable() is called from x86_pmu_enable() with
cpuc->enabled set to 0.  This means we weren't re-enabling the
counters after a context switch.

This patch just removes the check, as it should't be necessary
(and the equivelent x86_ generic code does not have the checks).

The origin of this problem is the KNC driver being based on the
P6 one.   The P6 driver also has this issue, but works anyway
due to various lucky accidents.
Signed-off-by: NVince Weaver <vincent.weaver@maine.edu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: eranian@gmail.com
Cc: Meadows
Cc: Lawrence F <lawrence.f.meadows@intel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210171303290.23243@vincent-weaver-1.um.maine.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>

7d011962

perf/x86: Make Intel KNC use full 40-bit width of counters · ae5ba47a

由 Vince Weaver 提交于 10月 17, 2012

Early versions of Intel KNC chips have a bug where bits above 32
were not properly set.  We worked around this by only using the
bottom 32 bits (out of 40 that should be available).

It turns out this workaround breaks overflow handling.

The buggy silicon will in theory never be used in production
systems, so remove this workaround so we get proper overflow
support.
Signed-off-by: NVince Weaver <vincent.weaver@maine.edu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: eranian@gmail.com
Cc: Meadows Lawrence F <lawrence.f.meadows@intel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210171302140.23243@vincent-weaver-1.um.maine.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>

ae5ba47a

perf/x86/uncore: Handle pci_read_config_dword() errors · 032c3851

由 Yan, Zheng 提交于 10月 24, 2012

This, beyond handling corner cases, also fixes some build warnings:

arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function ‘snbep_uncore_pci_disable_box’:
arch/x86/kernel/cpu/perf_event_intel_uncore.c:124:9: warning: ‘config’ is used uninitialized in this function [-Wuninitialized]
arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function ‘snbep_uncore_pci_enable_box’:
arch/x86/kernel/cpu/perf_event_intel_uncore.c:135:9: warning: ‘config’ is used uninitialized in this function [-Wuninitialized]
arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function ‘snbep_uncore_pci_read_counter’:
arch/x86/kernel/cpu/perf_event_intel_uncore.c:164:2: warning: ‘count’ is used uninitialized in this function [-Wuninitialized]
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Cc: a.p.zijlstra@chello.nl
Link: http://lkml.kernel.org/r/1351068140-13456-1-git-send-email-zheng.z.yan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

032c3851

x86-64: Fix page table accounting · 876ee61a

由 Jan Beulich 提交于 10月 04, 2012

Commit 20167d34 ("x86-64: Fix
accounting in kernel_physical_mapping_init()") went a little too
far by entirely removing the counting of pre-populated page
tables: this should be done at boot time (to cover the page
tables set up in early boot code), but shouldn't be done during
memory hot add.

Hence, re-add the removed increments of "pages", but make them
and the one in phys_pte_init() conditional upon !after_bootmem.
Reported-Acked-and-Tested-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/506DAFBA020000780009FA8C@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

876ee61a

perf/x86: Remove P6 cpuc->enabled check · 58e9eaf0

由 Vince Weaver 提交于 10月 19, 2012

Between 2.6.33 and 2.6.34 the PMU code was made modular.

The x86_pmu_enable() call was extended to disable cpuc->enabled
and iterate the counters, enabling one at a time, before calling
enable_all() at the end, followed by re-enabling cpuc->enabled.

Since cpuc->enabled was set to 0, that change effectively caused
the "val |= ARCH_PERFMON_EVENTSEL_ENABLE;" code in p6_pmu_enable_event()
and p6_pmu_disable_event() to be dead code that was never called.

This change removes this code (which was confusing) and adds some
extra commentary to make it more clear what is going on.
Signed-off-by: NVince Weaver <vincent.weaver@maine.edu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210191732000.14552@vincent-weaver-1.um.maine.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>

58e9eaf0

perf/x86: Update/fix generic events on P6 PMU · e09df478

由 Vince Weaver 提交于 10月 19, 2012

This patch updates the generic events on p6, including some new
extended cache events.

Values for these events were taken from the equivelant PAPI
predefined events.

Tested on a Pentium II.
Signed-off-by: NVince Weaver <vincent.weaver@maine.edu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210191730080.14552@vincent-weaver-1.um.maine.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>

e09df478

perf/x86: Fix P6 FP_ASSIST event constraint · 7991c9ca

由 Vince Weaver 提交于 10月 19, 2012

According to Intel SDM Volume 3B, FP_ASSIST is limited to Counter 1 only,
not Counter 0.

Tested on a Pentium II.
Signed-off-by: NVince Weaver <vincent.weaver@maine.edu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210191728570.14552@vincent-weaver-1.um.maine.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>

7991c9ca

Revert "x86/mm: Fix the size calculation of mapping tables" · 7b16bbf9

由 Dave Young 提交于 10月 18, 2012

Commit:

   722bc6b1 x86/mm: Fix the size calculation of mapping tables

Tried to address the issue that the first 2/4M should use 4k pages
if PSE enabled, but extra counts should only be valid for x86_32.

This commit caused a kdump regression: the kdump kernel hangs.

Work is in progress to fundamentally fix the various page table
initialization issues that we have, via the design suggested
by H. Peter Anvin, but it's not ready yet to be merged.

So, to get a working kdump revert to the last known working version,
which is the revert of this commit and of a followup fix (which was
incomplete):

   bd2753b2 x86/mm: Only add extra pages count for the first memory range during pre-allocation

Tested kdump on physical and virtual machines.
Signed-off-by: NDave Young <dyoung@redhat.com>
Acked-by: NYinghai Lu <yinghai@kernel.org>
Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NFlavio Leitner <fbl@redhat.com>
Tested-by: NFlavio Leitner <fbl@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Flavio Leitner <fbl@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: ianfang.cn@gmail.com
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@kernel.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

7b16bbf9

x86/perf: Fix virtualization sanity check · bffd5fc2

由 Andre Przywara 提交于 10月 09, 2012

In check_hw_exists() we try to detect non-emulated MSR accesses
by writing an arbitrary value into one of the PMU registers
and check if it's value after a readout is still the same.
This algorithm silently assumes that the register does not contain
the magic value already, which is wrong in at least one situation.

Fix the algorithm to really do a read-modify-write cycle. This fixes
a warning under Xen under some circumstances on AMD family 10h CPUs.

The reasons in more details actually sound like a story from
Believe It or Not!:

First you need an AMD family 10h/12h CPU. These do not reset the
PERF_CTR registers on a reboot.
Now you boot bare metal Linux, which goes successfully through this
check, but leaves the magic value of 0xabcd in the register. You
don't use the performance counters, but do a reboot (warm reset).
Then you choose to boot Xen. The check will be triggered with a
recent Linux kernel as Dom0 again, trying to write 0xabcd into the
MSR. Xen silently drops the write (expected), but the subsequent read
will return the value in the register, which just happens to be the
expected magic value. Thus the test misleadingly succeeds, leaving
the kernel in the belief that the PMU is available. This will trigger
the following message:

[    0.020294] ------------[ cut here ]------------
[    0.020311] WARNING: at arch/x86/xen/enlighten.c:730 xen_apic_write+0x15/0x17()
[    0.020318] Hardware name: empty
[    0.020323] Modules linked in:
[    0.020334] Pid: 1, comm: swapper/0 Not tainted 3.3.8 #7
[    0.020340] Call Trace:
[    0.020354]  [<ffffffff81050379>] warn_slowpath_common+0x80/0x98
[    0.020369]  [<ffffffff810503a6>] warn_slowpath_null+0x15/0x17
[    0.020378]  [<ffffffff810034df>] xen_apic_write+0x15/0x17
[    0.020392]  [<ffffffff8101cb2b>] perf_events_lapic_init+0x2e/0x30
[    0.020410]  [<ffffffff81ee4dd0>] init_hw_perf_events+0x250/0x407
[    0.020419]  [<ffffffff81ee4b80>] ? check_bugs+0x2d/0x2d
[    0.020430]  [<ffffffff81002181>] do_one_initcall+0x7a/0x131
[    0.020444]  [<ffffffff81edbbf9>] kernel_init+0x91/0x15d
[    0.020456]  [<ffffffff817caaa4>] kernel_thread_helper+0x4/0x10
[    0.020471]  [<ffffffff817c347c>] ? retint_restore_args+0x5/0x6
[    0.020481]  [<ffffffff817caaa0>] ? gs_change+0x13/0x13
[    0.020500] ---[ end trace a7919e7f17c0a725 ]---

The new code will change every of the 16 low bits read from the
register and tries to write and read-back that modified number
from the MSR.
Signed-off-by: NAndre Przywara <andre.przywara@amd.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Link: http://lkml.kernel.org/r/1349797115-28346-2-git-send-email-andre.przywara@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

bffd5fc2

23 10月, 2012 3 次提交

KVM guest: exit idleness when handling KVM_PV_REASON_PAGE_NOT_PRESENT · c5e015d4

由 Sasha Levin 提交于 10月 19, 2012

KVM_PV_REASON_PAGE_NOT_PRESENT kicks cpu out of idleness, but we haven't
marked that spot as an exit from idleness.

Not doing so can cause RCU warnings such as:

[  732.788386] ===============================
[  732.789803] [ INFO: suspicious RCU usage. ]
[  732.790032] 3.7.0-rc1-next-20121019-sasha-00002-g6d8d02d-dirty #63 Tainted: G        W
[  732.790032] -------------------------------
[  732.790032] include/linux/rcupdate.h:738 rcu_read_lock() used illegally while idle!
[  732.790032]
[  732.790032] other info that might help us debug this:
[  732.790032]
[  732.790032]
[  732.790032] RCU used illegally from idle CPU!
[  732.790032] rcu_scheduler_active = 1, debug_locks = 1
[  732.790032] RCU used illegally from extended quiescent state!
[  732.790032] 2 locks held by trinity-child31/8252:
[  732.790032]  #0:  (&rq->lock){-.-.-.}, at: [<ffffffff83a67528>] __schedule+0x178/0x8f0
[  732.790032]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81152bde>] cpuacct_charge+0xe/0x200
[  732.790032]
[  732.790032] stack backtrace:
[  732.790032] Pid: 8252, comm: trinity-child31 Tainted: G        W    3.7.0-rc1-next-20121019-sasha-00002-g6d8d02d-dirty #63
[  732.790032] Call Trace:
[  732.790032]  [<ffffffff8118266b>] lockdep_rcu_suspicious+0x10b/0x120
[  732.790032]  [<ffffffff81152c60>] cpuacct_charge+0x90/0x200
[  732.790032]  [<ffffffff81152bde>] ? cpuacct_charge+0xe/0x200
[  732.790032]  [<ffffffff81158093>] update_curr+0x1a3/0x270
[  732.790032]  [<ffffffff81158a6a>] dequeue_entity+0x2a/0x210
[  732.790032]  [<ffffffff81158ea5>] dequeue_task_fair+0x45/0x130
[  732.790032]  [<ffffffff8114ae29>] dequeue_task+0x89/0xa0
[  732.790032]  [<ffffffff8114bb9e>] deactivate_task+0x1e/0x20
[  732.790032]  [<ffffffff83a67c29>] __schedule+0x879/0x8f0
[  732.790032]  [<ffffffff8117e20d>] ? trace_hardirqs_off+0xd/0x10
[  732.790032]  [<ffffffff810a37a5>] ? kvm_async_pf_task_wait+0x1d5/0x2b0
[  732.790032]  [<ffffffff83a67cf5>] schedule+0x55/0x60
[  732.790032]  [<ffffffff810a37c4>] kvm_async_pf_task_wait+0x1f4/0x2b0
[  732.790032]  [<ffffffff81139e50>] ? abort_exclusive_wait+0xb0/0xb0
[  732.790032]  [<ffffffff81139c25>] ? prepare_to_wait+0x25/0x90
[  732.790032]  [<ffffffff810a3a66>] do_async_page_fault+0x56/0xa0
[  732.790032]  [<ffffffff83a6a6e8>] async_page_fault+0x28/0x30
Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
Acked-by: NGleb Natapov <gleb@redhat.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c5e015d4

KVM: apic: fix LDR calculation in x2apic mode · 7f46ddbd

由 Gleb Natapov 提交于 10月 14, 2012

Signed-off-by: NGleb Natapov <gleb@redhat.com>
Reviewed-by: NChegu Vinod  <chegu_vinod@hp.com>
Tested-by: NChegu Vinod <chegu_vinod@hp.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7f46ddbd

KVM: MMU: fix release noslot pfn · f3ac1a4b

由 Xiao Guangrong 提交于 10月 16, 2012

We can not directly call kvm_release_pfn_clean to release the pfn
since we can meet noslot pfn which is used to cache mmio info into
spte
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: NAvi Kivity <avi@redhat.com>

f3ac1a4b

20 10月, 2012 5 次提交

perf/x86: Disable uncore on virtualized CPUs · a05123bd

由 Yan, Zheng 提交于 8月 21, 2012

Initializing uncore PMU on virtualized CPU may hang the kernel.
This is because kvm does not emulate the entire hardware. Thers
are lots of uncore related MSRs, making kvm enumerate them all
is a non-trival task. So just disable uncore on virtualized CPU.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Tested-by: NPekka Enberg <penberg@kernel.org>
Cc: a.p.zijlstra@chello.nl
Cc: eranian@google.com
Cc: andi@firstfloor.org
Cc: avi@redhat.com
Link: http://lkml.kernel.org/r/1345540117-14164-1-git-send-email-zheng.z.yan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

a05123bd

xen/x86: don't corrupt %eip when returning from a signal handler · a349e23d

由 David Vrabel 提交于 10月 19, 2012

In 32 bit guests, if a userspace process has %eax == -ERESTARTSYS
(-512) or -ERESTARTNOINTR (-513) when it is interrupted by an event
/and/ the process has a pending signal then %eip (and %eax) are
corrupted when returning to the main process after handling the
signal.  The application may then crash with SIGSEGV or a SIGILL or it
may have subtly incorrect behaviour (depending on what instruction it
returned to).

The occurs because handle_signal() is incorrectly thinking that there
is a system call that needs to restarted so it adjusts %eip and %eax
to re-execute the system call instruction (even though user space had
not done a system call).

If %eax == -514 (-ERESTARTNOHAND (-514) or -ERESTART_RESTARTBLOCK
(-516) then handle_signal() only corrupted %eax (by setting it to
-EINTR).  This may cause the application to crash or have incorrect
behaviour.

handle_signal() assumes that regs->orig_ax >= 0 means a system call so
any kernel entry point that is not for a system call must push a
negative value for orig_ax.  For example, for physical interrupts on
bare metal the inverse of the vector is pushed and page_fault() sets
regs->orig_ax to -1, overwriting the hardware provided error code.

xen_hypervisor_callback() was incorrectly pushing 0 for orig_ax
instead of -1.

Classic Xen kernels pushed %eax which works as %eax cannot be both
non-negative and -RESTARTSYS (etc.), but using -1 is consistent with
other non-system call entry points and avoids some of the tests in
handle_signal().

There were similar bugs in xen_failsafe_callback() of both 32 and
64-bit guests. If the fault was corrected and the normal return path
was used then 0 was incorrectly pushed as the value for orig_ax.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Acked-by: NJan Beulich <JBeulich@suse.com>
Acked-by: NIan Campbell <ian.campbell@citrix.com>
Cc: stable@vger.kernel.org
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

a349e23d

xen: grant: use xen_pfn_t type for frame_list. · ef32f892

由 Ian Campbell 提交于 10月 17, 2012

This correctly sizes it as 64 bit on ARM but leaves it as unsigned
long on x86 (therefore no intended change on x86).

The long and ulong guest handles are now unused (and a bit dangerous)
so remove them.
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

ef32f892

xen: sysfs: fix build warning. · 37ea0fcb

由 Ian Campbell 提交于 10月 17, 2012

Define PRI macros for xen_ulong_t and xen_pfn_t and use to fix:
drivers/xen/sys-hypervisor.c:288:4: warning: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'xen_ulong_t' [-Wformat]

Ideally this would use PRIx64 on ARM but these (or equivalent) don't
seem to be available in the kernel.
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

37ea0fcb

xen/x86: remove duplicated include from enlighten.c · c2103b7e

由 Wei Yongjun 提交于 10月 07, 2012

Remove duplicated include.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

CC: stable@vger.kernel.org
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

c2103b7e

19 10月, 2012 1 次提交

x86, MCE: Remove bios_cmci_threshold sysfs attribute · 5bc66170

由 Borislav Petkov 提交于 10月 18, 2012

450cc201 ("x86/mce: Provide boot argument to honour bios-set CMCI
threshold") added the bios_cmci_threshold sysfs attribute which was
supposed to communicate to userspace tools that BIOS CMCI threshold has
been honoured.

However, this info is not of any importance to userspace - it should
rather get the actual error count it has been thresholded already from
MCi_STATUS[38:52].

So drop this before it becomes a used interface (good thing we caught
this early in 3.7-rc1, right after the merge window closed).

Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: NTony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20121017105940.GA14590@x1.osrc.amd.comSigned-off-by: NBorislav Petkov <borislav.petkov@amd.com>

5bc66170