提交 · e850f9c33c0c7cc4097ae29f6f8d633237d235e6 · openanolis / cloud-kernel

21 4月, 2013 3 次提交

perf/x86/intel: Add Ivy Bridge-EP uncore support · e850f9c3

由 Yan, Zheng 提交于 4月 16, 2013

The uncore subsystem in Ivy Bridge-EP is similar to Sandy
Bridge-EP. There are some differences in config register
encoding and pci device IDs. The Ivy Bridge-EP uncore also
supports a few new events.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: ak@linux.intel.com
Link: http://lkml.kernel.org/r/1366113067-3262-4-git-send-email-zheng.z.yan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

e850f9c3

perf/x86/intel: Fix SNB-EP CBO and PCU uncore PMU filter management · 46bdd905

由 Yan, Zheng 提交于 4月 16, 2013

The existing code assumes all Cbox and PCU events are using
filter, but actually the filter is event specific. Furthermore
the filter is sub-divided into multiple fields which are used
by different events.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Link: http://lkml.kernel.org/r/1366113067-3262-3-git-send-email-zheng.z.yan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Reported-by: NStephane Eranian <eranian@google.com>

46bdd905

perf/x86: Avoid kfree() in CPU_{STARTING,DYING} · 22cc4ccf

由 Yan, Zheng 提交于 4月 16, 2013

On -rt kfree() can schedule, but CPU_{STARTING,DYING} should be
atomic. So use a list to defer kfree until CPU_{ONLINE,DEAD}.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: ak@linux.intel.com
Link: http://lkml.kernel.org/r/1366113067-3262-2-git-send-email-zheng.z.yan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

22cc4ccf

16 4月, 2013 1 次提交

perf/x86: Fix offcore_rsp valid mask for SNB/IVB · f1923820

由 Stephane Eranian 提交于 4月 16, 2013

The valid mask for both offcore_response_0 and
offcore_response_1 was wrong for SNB/SNB-EP,
IVB/IVB-EP. It was possible to write to
reserved bit and cause a GP fault crashing
the kernel.

This patch fixes the problem by correctly marking the
reserved bits in the valid mask for all the processors
mentioned above.

A distinction between desktop and server parts is introduced
because bits 24-30 are only available on the server parts.

This version of the  patch is just a rebase to perf/urgent tree
and should apply to older kernels as well.
Signed-off-by: NStephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: gregkh@linuxfoundation.org
Cc: security@kernel.org
Cc: ak@linux.intel.com
Signed-off-by: NIngo Molnar <mingo@kernel.org>

f1923820

13 4月, 2013 2 次提交

uretprobes/x86: Hijack return address · 791eca10

由 Anton Arapov 提交于 4月 03, 2013

Hijack the return address and replace it with a trampoline address.
Signed-off-by: NAnton Arapov <anton@redhat.com>
Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

791eca10

x86-32: Fix possible incomplete TLB invalidate with PAE pagetables · 1de14c3c

由 Dave Hansen 提交于 4月 12, 2013

This patch attempts to fix:

	https://bugzilla.kernel.org/show_bug.cgi?id=56461

The symptom is a crash and messages like this:

	chrome: Corrupted page table at address 34a03000
	*pdpt = 0000000000000000 *pde = 0000000000000000
	Bad pagetable: 000f [#1] PREEMPT SMP

Ingo guesses this got introduced by commit 611ae8e3 ("x86/tlb:
enable tlb flush range support for x86") since that code started to free
unused pagetables.

On x86-32 PAE kernels, that new code has the potential to free an entire
PMD page and will clear one of the four page-directory-pointer-table
(aka pgd_t entries).

The hardware aggressively "caches" these top-level entries and invlpg
does not actually affect the CPU's copy.  If we clear one we *HAVE* to
do a full TLB flush, otherwise we might continue using a freed pmd page.
(note, we do this properly on the population side in pud_populate()).

This patch tracks whenever we clear one of these entries in the 'struct
mmu_gather', and ensures that we follow up with a full tlb flush.

BTW, I disassembled and checked that:

	if (tlb->fullmm == 0)
and
	if (!tlb->fullmm && !tlb->need_flush_all)

generate essentially the same code, so there should be zero impact there
to the !PAE case.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Artem S Tashkinov <t.artem@mailcity.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1de14c3c

12 4月, 2013 2 次提交

x86/mm: Flush lazy MMU when DEBUG_PAGEALLOC is set · 26564600

由 Boris Ostrovsky 提交于 4月 11, 2013

When CONFIG_DEBUG_PAGEALLOC is set page table updates made by
kernel_map_pages() are not made visible (via TLB flush)
immediately if lazy MMU is on. In environments that support lazy
MMU (e.g. Xen) this may lead to fatal page faults, for example,
when zap_pte_range() needs to allocate pages in
__tlb_remove_page() -> tlb_next_batch().
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: konrad.wilk@oracle.com
Link: http://lkml.kernel.org/r/1365703192-2089-1-git-send-email-boris.ostrovsky@oracle.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

26564600

x86/mm/cpa/selftest: Fix false positive in CPA self test · 18699739

由 Andrea Arcangeli 提交于 4月 11, 2013

If the pmd is not present, _PAGE_PSE will not be set anymore.
Fix the false positive.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1365687369-30802-1-git-send-email-aarcange@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

18699739

11 4月, 2013 3 次提交

x86/mm/cpa: Convert noop to functional fix · f76cfa3c

由 Andrea Arcangeli 提交于 4月 10, 2013

Commit:

  a8aed3e0 ("x86/mm/pageattr: Prevent PSE and GLOABL leftovers to confuse pmd/pte_present and pmd_huge")

introduced a valid fix but one location that didn't trigger the bug that
lead to finding those (small) problems, wasn't updated using the
right variable.

The wrong variable was also initialized for no good reason, that
may have been the source of the confusion. Remove the noop
initialization accordingly.

Commit a8aed3e0 also erroneously removed one canon_pgprot pass meant
to clear pmd bitflags not supported in hardware by older CPUs, that
automatically gets corrected by this patch too by applying it to the right
variable in the new location.
Reported-by: NStefan Bader <stefan.bader@canonical.com>
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Acked-by: NBorislav Petkov <bp@alien8.de>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Mel Gorman <mgorman@suse.de>
Link: http://lkml.kernel.org/r/1365600505-19314-1-git-send-email-aarcange@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

f76cfa3c

x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal · 511ba86e

由 Boris Ostrovsky 提交于 3月 23, 2013

Invoking arch_flush_lazy_mmu_mode() results in calls to
preempt_enable()/disable() which may have performance impact.

Since lazy MMU is not used on bare metal we can patch away
arch_flush_lazy_mmu_mode() so that it is never called in such
environment.

[ hpa: the previous patch "Fix vmalloc_fault oops during lazy MMU
  updates" may cause a minor performance regression on
  bare metal.  This patch resolves that performance regression.  It is
  somewhat unclear to me if this is a good -stable candidate. ]
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1364045796-10720-2-git-send-email-konrad.wilk@oracle.comTested-by: NJosh Boyer <jwboyer@redhat.com>
Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> SEE NOTE ABOVE

511ba86e

x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates · 1160c277

由 Samu Kallio 提交于 3月 23, 2013

In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops
when lazy MMU updates are enabled, because set_pgd effects are being
deferred.

One instance of this problem is during process mm cleanup with memory
cgroups enabled. The chain of events is as follows:

- zap_pte_range enables lazy MMU updates
- zap_pte_range eventually calls mem_cgroup_charge_statistics,
  which accesses the vmalloc'd mem_cgroup per-cpu stat area
- vmalloc_fault is triggered which tries to sync the corresponding
  PGD entry with set_pgd, but the update is deferred
- vmalloc_fault oopses due to a mismatch in the PUD entries

The OOPs usually looks as so:

------------[ cut here ]------------
kernel BUG at arch/x86/mm/fault.c:396!
invalid opcode: 0000 [#1] SMP
.. snip ..
CPU 1
Pid: 10866, comm: httpd Not tainted 3.6.10-4.fc18.x86_64 #1
RIP: e030:[<ffffffff816271bf>]  [<ffffffff816271bf>] vmalloc_fault+0x11f/0x208
.. snip ..
Call Trace:
 [<ffffffff81627759>] do_page_fault+0x399/0x4b0
 [<ffffffff81004f4c>] ? xen_mc_extend_args+0xec/0x110
 [<ffffffff81624065>] page_fault+0x25/0x30
 [<ffffffff81184d03>] ? mem_cgroup_charge_statistics.isra.13+0x13/0x50
 [<ffffffff81186f78>] __mem_cgroup_uncharge_common+0xd8/0x350
 [<ffffffff8118aac7>] mem_cgroup_uncharge_page+0x57/0x60
 [<ffffffff8115fbc0>] page_remove_rmap+0xe0/0x150
 [<ffffffff8115311a>] ? vm_normal_page+0x1a/0x80
 [<ffffffff81153e61>] unmap_single_vma+0x531/0x870
 [<ffffffff81154962>] unmap_vmas+0x52/0xa0
 [<ffffffff81007442>] ? pte_mfn_to_pfn+0x72/0x100
 [<ffffffff8115c8f8>] exit_mmap+0x98/0x170
 [<ffffffff810050d9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
 [<ffffffff81059ce3>] mmput+0x83/0xf0
 [<ffffffff810624c4>] exit_mm+0x104/0x130
 [<ffffffff8106264a>] do_exit+0x15a/0x8c0
 [<ffffffff810630ff>] do_group_exit+0x3f/0xa0
 [<ffffffff81063177>] sys_exit_group+0x17/0x20
 [<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b

Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the
changes visible to the consistency checks.

Cc: <stable@vger.kernel.org>
RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=914737Tested-by: NJosh Boyer <jwboyer@redhat.com>
Reported-and-Tested-by: NKrishna Raman <kraman@redhat.com>
Signed-off-by: NSamu Kallio <samu.kallio@aberdeencloud.com>
Link: http://lkml.kernel.org/r/1364045796-10720-1-git-send-email-konrad.wilk@oracle.comTested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

1160c277

10 4月, 2013 1 次提交

perf/x86: Add Sandy Bridge constraints for CYCLE_ACTIVITY.* · f8378f52

由 Andi Kleen 提交于 3月 08, 2013

Add CYCLE_ACTIVITY.CYCLES_NO_DISPATCH/CYCLES_L1D_PENDING constraints.

These recently documented events have restrictions to counter
0-3 and counter 2 respectively. The perf scheduler needs to know
that to schedule them correctly.

IvyBridge already has the necessary constraints.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: a.p.zijlstra@chello.nl
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1362784968-12542-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

f8378f52

08 4月, 2013 1 次提交

kprobes/x86: Just return error for sanity check failure instead of using BUG_ON · 8101376d

由 Masami Hiramatsu 提交于 4月 04, 2013

Return an error from __copy_instruction() and use printk() to
give us a more productive message, since this is just an error
case which we can handle and also the BUG_ON() never tells us
why and what happened.

This is related to the following bug-report:

   https://bugzilla.redhat.com/show_bug.cgi?id=910649Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20130404104230.22862.85242.stgit@mhiramat-M0-7522Signed-off-by: NIngo Molnar <mingo@kernel.org>

8101376d

07 4月, 2013 1 次提交

KVM: Allow cross page reads and writes from cached translations. · 8f964525

由 Andrew Honig 提交于 3月 29, 2013

This patch adds support for kvm_gfn_to_hva_cache_init functions for
reads and writes that will cross a page.  If the range falls within
the same memslot, then this will be a fast operation.  If the range
is split between two memslots, then the slower kvm_read_guest and
kvm_write_guest are used.

Tested: Test against kvm_clock unit tests.
Signed-off-by: NAndrew Honig <ahonig@google.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

8f964525

06 4月, 2013 1 次提交

x86: Fix rebuild with EFI_STUB enabled · 91870824

由 Jan Beulich 提交于 4月 03, 2013

eboot.o and efi_stub_$(BITS).o didn't get added to "targets", and hence
their .cmd files don't get included by the build machinery, leading to
the files always getting rebuilt.

Rather than adding the two files individually, take the opportunity and
add $(VMLINUX_OBJS) to "targets" instead, thus allowing the assignment
at the top of the file to be shrunk quite a bit.

At the same time, remove a pointless flags override line - the variable
assigned to was misspelled anyway, and the options added are
meaningless for assembly sources.

[ hpa: the patch is not minimal, but I am taking it for -urgent anyway
  since the excess impact of the patch seems to be small enough. ]
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/515C5D2502000078000CA6AD@nat28.tlf.novell.com
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

91870824

03 4月, 2013 2 次提交

x86: remove the x32 syscall bitmask from syscall_get_nr() · 8b4b9f27

由 Paul Moore 提交于 2月 15, 2013

Commit fca460f9 simplified the x32
implementation by creating a syscall bitmask, equal to 0x40000000, that
could be applied to x32 syscalls such that the masked syscall number
would be the same as a x86_64 syscall.  While that patch was a nice
way to simplify the code, it went a bit too far by adding the mask to
syscall_get_nr(); returning the masked syscall numbers can cause
confusion with callers that expect syscall numbers matching the x32
ABI, e.g. unmasked syscall numbers.

This patch fixes this by simply removing the mask from syscall_get_nr()
while preserving the other changes from the original commit.  While
there are several syscall_get_nr() callers in the kernel, most simply
check that the syscall number is greater than zero, in this case this
patch will have no effect.  Of those remaining callers, they appear
to be few, seccomp and ftrace, and from my testing of seccomp without
this patch the original commit definitely breaks things; the seccomp
filter does not correctly filter the syscalls due to the difference in
syscall numbers in the BPF filter and the value from syscall_get_nr().
Applying this patch restores the seccomp BPF filter functionality on
x32.

I've tested this patch with the seccomp BPF filters as well as ftrace
and everything looks reasonable to me; needless to say general usage
seemed fine as well.
Signed-off-by: NPaul Moore <pmoore@redhat.com>
Link: http://lkml.kernel.org/r/20130215172143.12549.10292.stgit@localhost
Cc: <stable@vger.kernel.org>
Cc: Will Drewry <wad@chromium.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

8b4b9f27

xen/mmu: On early bootup, flush the TLB when changing RO->RW bits Xen provided pagetables. · b2222794

由 Konrad Rzeszutek Wilk 提交于 3月 29, 2013

Occassionaly on a DL380 G4 the guest would crash quite early with this:

(XEN) d244:v0: unhandled page fault (ec=0003)
(XEN) Pagetable walk from ffffffff84dc7000:
(XEN)  L4[0x1ff] = 00000000c3f18067 0000000000001789
(XEN)  L3[0x1fe] = 00000000c3f14067 000000000000178d
(XEN)  L2[0x026] = 00000000dc8b2067 0000000000004def
(XEN)  L1[0x1c7] = 00100000dc8da067 0000000000004dc7
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 244 (vcpu#0) crashed on cpu#3:
(XEN) ----[ Xen-4.1.3OVM  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    3
(XEN) RIP:    e033:[<ffffffff81263f22>]
(XEN) RFLAGS: 0000000000000216   EM: 1   CONTEXT: pv guest
(XEN) rax: 0000000000000000   rbx: ffffffff81785f88   rcx: 000000000000003f
(XEN) rdx: 0000000000000000   rsi: 00000000dc8da063   rdi: ffffffff84dc7000

The offending code shows it to be a loop writting the value zero
(%rax) in the %rdi (the L4 provided by Xen) register:

   0: 44 00 00             add    %r8b,(%rax)
   3: 31 c0                 xor    %eax,%eax
   5: b9 40 00 00 00       mov    $0x40,%ecx
   a: 66 0f 1f 84 00 00 00 nopw   0x0(%rax,%rax,1)
  11: 00 00
  13: ff c9                 dec    %ecx
  15:* 48 89 07             mov    %rax,(%rdi)     <-- trapping instruction
  18: 48 89 47 08           mov    %rax,0x8(%rdi)
  1c: 48 89 47 10           mov    %rax,0x10(%rdi)

which fails. xen_setup_kernel_pagetable recycles some of the Xen's
page-table entries when it has switched over to its Linux page-tables.

Right before try to clear the page, we  make a hypercall to change
it from _RO to  _RW and that works (otherwise we would hit an BUG()).
And the _RW flag is set for that page:
(XEN)  L1[0x1c7] = 001000004885f067 0000000000004dc7

The error code is 3, so PFEC_page_present and PFEC_write_access, so page is
present (correct), and we tried to write to the page, but a violation
occurred. The one theory is that the the page entries in hardware
(which are cached) are not up to date with what we just set. Especially
as we have just done an CR3 write and flushed the multicalls.

This patch does solve the problem by flusing out the TLB page
entry after changing it from _RO to _RW and we don't hit this
issue anymore.

Fixed-Oracle-Bug: 16243091 [ON OCCASIONS VM START GOES INTO
'CRASH' STATE: CLEAR_PAGE+0X12 ON HP DL380 G4]
Reported-and-Tested-by: NSaar Maoz <Saar.Maoz@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

b2222794

01 4月, 2013 4 次提交

perf/x86: Add support for PEBS Precise Store · 9ad64c0f

由 Stephane Eranian 提交于 1月 24, 2013

This patch adds support for PEBS Precise Store
which is available on Intel Sandy Bridge and
Ivy Bridge processors.

To use Precise store, the proper PEBS event
must be used: mem_trans_retired:precise_stores.
For the perf tool, the generic mem-stores event
exported via sysfs can be used directly.
Signed-off-by: NStephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: acme@redhat.com
Cc: jolsa@redhat.com
Cc: namhyung.kim@lge.com
Link: http://lkml.kernel.org/r/1359040242-8269-11-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

9ad64c0f

perf/x86: Export PEBS load latency threshold register to sysfs · a63fcab4

由 Stephane Eranian 提交于 1月 24, 2013

Make the PEBS Load Latency threshold register layout
and encoding visible to user level tools.
Signed-off-by: NStephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: acme@redhat.com
Cc: jolsa@redhat.com
Cc: namhyung.kim@lge.com
Link: http://lkml.kernel.org/r/1359040242-8269-10-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

a63fcab4

perf/x86: Add memory profiling via PEBS Load Latency · f20093ee

由 Stephane Eranian 提交于 1月 24, 2013

This patch adds support for memory profiling using the
PEBS Load Latency facility.

Load accesses are sampled by HW and the instruction
address, data address, load latency, data source, tlb,
locked information can be saved in the sampling buffer
if using the PERF_SAMPLE_COST (for latency),
PERF_SAMPLE_ADDR, PERF_SAMPLE_DATA_SRC types.

To enable PEBS Load Latency, users have to use the
model specific event:

 - on NHM/WSM: MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD
 - on SNB/IVB: MEM_TRANS_RETIRED:LATENCY_ABOVE_THRESHOLD

To make things easier, this patch also exports a generic
alias via sysfs: mem-loads. It export the right event
encoding based on the host CPU and can be used directly
by the perf tool.

Loosely based on Intel's Lin Ming patch posted on LKML
in July 2011.
Signed-off-by: NStephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: acme@redhat.com
Cc: jolsa@redhat.com
Cc: namhyung.kim@lge.com
Link: http://lkml.kernel.org/r/1359040242-8269-9-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

f20093ee

perf/x86: Add flags to event constraints · 9fac2cf3

由 Stephane Eranian 提交于 1月 24, 2013

This patch adds a flags field to each event constraint.
It can be used to store event specific features which can
then later be used by scheduling code or low-level x86 code.

The flags are propagated into event->hw.flags during the
get_event_constraint() call. They are cleared during the
put_event_constraint() call.

This mechanism is going to be used by the PEBS-LL patches.
It avoids defining yet another table to hold event specific
information.
Signed-off-by: NStephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: jolsa@redhat.com
Cc: namhyung.kim@lge.com
Link: http://lkml.kernel.org/r/1359040242-8269-4-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

9fac2cf3

28 3月, 2013 1 次提交

xen/mmu: Move the setting of pvops.write_cr3 to later phase in bootup. · d3eb2c89

由 Konrad Rzeszutek Wilk 提交于 3月 22, 2013

We move the setting of write_cr3 from the early bootup variant
(see git commit 0cc9129d
"x86-64, xen, mmu: Provide an early version of write_cr3.")
to a more appropiate location.

This new location sets all of the other non-early variants
of pvops calls - and most importantly is before the
alternative_asm mechanism kicks in.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

d3eb2c89

27 3月, 2013 2 次提交

perf/x86: Improve sysfs event mapping with event string · 3a54aaa0

由 Stephane Eranian 提交于 1月 24, 2013

This patch extends Jiri's changes to make generic
events mapping visible via sysfs. The patch extends
the mechanism to non-generic events by allowing
the mappings to be hardcoded in strings.

This mechanism will be used by the PEBS-LL patch
later on.
Signed-off-by: NStephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: acme@redhat.com
Cc: jolsa@redhat.com
Cc: namhyung.kim@lge.com
Link: http://lkml.kernel.org/r/1359040242-8269-3-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
[ fixed up conflict with 2663960c "perf: Make EVENT_ATTR global" ]
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

3a54aaa0

perf/x86: Support CPU specific sysfs events · 1a6461b1

由 Andi Kleen 提交于 1月 24, 2013

Add a way for the CPU initialization code to register additional
events, and merge them into the events attribute directory. Used
in the next patch.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NStephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: acme@redhat.com
Cc: jolsa@redhat.com
Cc: namhyung.kim@lge.com
Link: http://lkml.kernel.org/r/1359040242-8269-2-git-send-email-eranian@google.com
[ small cleanups ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>
[ merge_attr returns a **, not just * ]
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

1a6461b1

25 3月, 2013 1 次提交

intel-pstate: Use #defines instead of hard-coded values. · 05e99c8c

由 Konrad Rzeszutek Wilk 提交于 3月 20, 2013

They are defined in coreboot (MSR_PLATFORM) and the other
one is already defined in msr-index.h.

Let's use those.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Acked-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

05e99c8c

22 3月, 2013 2 次提交

xen-pciback: notify hypervisor about devices intended to be assigned to guests · 909b3fdb

由 Jan Beulich 提交于 3月 12, 2013

For MSI-X capable devices the hypervisor wants to write protect the
MSI-X table and PBA, yet it can't assume that resources have been
assigned to their final values at device enumeration time. Thus have
pciback do that notification, as having the device controlled by it is
a prerequisite to assigning the device to guests anyway.

This is the kernel part of hypervisor side commit 4245d33 ("x86/MSI:
add mechanism to fully protect MSI-X table from PV guest accesses") on
the master branch of git://xenbits.xen.org/xen.git.

CC: stable@vger.kernel.org
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

909b3fdb

x86, microcode_intel_early: Mark apply_microcode_early() as cpuinit · f564c241

由 H. Peter Anvin 提交于 3月 21, 2013

Add missing __cpuinit annotation to apply_microcode_early().
Reported-by: NShaun Ruffell <sruffell@digium.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/20130320170310.GA23362@digium.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

f564c241

21 3月, 2013 1 次提交

perf/x86: Fix uninitialized pt_regs in intel_pmu_drain_bts_buffer() · 0e48026a

由 Stephane Eranian 提交于 3月 19, 2013

This patch fixes an uninitialized pt_regs struct in drain BTS
function. The pt_regs struct is propagated all the way to the
code_get_segment() function from perf_instruction_pointer()
and may get garbage.

We cannot simply inherit the actual pt_regs from the interrupt
because BTS must be flushed on context-switch or when the
associated event is disabled. And there we do not have a pt_regs
handy.

Setting pt_regs to all zeroes may not be the best option but it
is not clear what else to do given where the drain_bts_buffer()
is called from.

In V2, we move the memset() later in the code to avoid doing it
when we end up returning early without doing the actual BTS
processing. Also dropped the reg.val initialization because it
is redundant with the memset() as suggested by PeterZ.
Signed-off-by: NStephane Eranian <eranian@google.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: peterz@infradead.org
Cc: sqazi@google.com
Cc: ak@linux.intel.com
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/20130319151038.GA25439@quadSigned-off-by: NIngo Molnar <mingo@kernel.org>

0e48026a

20 3月, 2013 3 次提交

x86-32, microcode_intel_early: Fix crash with CONFIG_DEBUG_VIRTUAL · c83a9d5e

由 Fenghua Yu 提交于 3月 19, 2013

In 32-bit, __pa_symbol() in CONFIG_DEBUG_VIRTUAL accesses kernel data
(e.g.  max_low_pfn) that not only hasn't been setup yet in such early
boot phase, but since we are in linear mode, cannot even be detected
as uninitialized.

Thus, use __pa_nodebug() rather than __pa_symbol() to get a global
symbol's physical address.
Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1363705484-27645-1-git-send-email-fenghua.yu@intel.comReported-and-tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

c83a9d5e

KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797) · 0b79459b

由 Andy Honig 提交于 2月 20, 2013

There is a potential use after free issue with the handling of
MSR_KVM_SYSTEM_TIME. If the guest specifies a GPA in a movable or removable
memory such as frame buffers then KVM might continue to write to that
address even after it's removed via KVM_SET_USER_MEMORY_REGION. KVM pins
the page in memory so it's unlikely to cause an issue, but if the user
space component re-purposes the memory previously used for the guest, then
the guest will be able to corrupt that memory.

Tested: Tested against kvmclock unit test
Signed-off-by: NAndrew Honig <ahonig@google.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0b79459b

KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796) · c300aa64

由 Andy Honig 提交于 3月 11, 2013

If the guest sets the GPA of the time_page so that the request to update the
time straddles a page then KVM will write onto an incorrect page.  The
write is done byusing kmap atomic to get a pointer to the page for the time
structure and then performing a memcpy to that page starting at an offset
that the guest controls.  Well behaved guests always provide a 32-byte aligned
address, however a malicious guest could use this to corrupt host kernel
memory.

Tested: Tested against kvmclock unit test.
Signed-off-by: NAndrew Honig <ahonig@google.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c300aa64

19 3月, 2013 2 次提交

KVM: x86: fix deadlock in clock-in-progress request handling · c09664bb

由 Marcelo Tosatti 提交于 3月 18, 2013

There is a deadlock in pvclock handling:

cpu0:                                               cpu1:
kvm_gen_update_masterclock()
                                              kvm_guest_time_update()
 spin_lock(pvclock_gtod_sync_lock)
                                               local_irq_save(flags)

spin_lock(pvclock_gtod_sync_lock)

 kvm_make_mclock_inprogress_request(kvm)
  make_all_cpus_request()
   smp_call_function_many()

Now if smp_call_function_many() called by cpu0 tries to call function on
cpu1 there will be a deadlock.

Fix by moving pvclock_gtod_sync_lock protected section outside irq
disabled section.

Analyzed by Gleb Natapov <gleb@redhat.com>
Acked-by: NGleb Natapov <gleb@redhat.com>
Reported-and-Tested-by: NYongjie Ren <yongjie.ren@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c09664bb

x86-64: Fix the failure case in copy_user_handle_tail() · 66db3feb

由 CQ Tang 提交于 3月 18, 2013

The increment of "to" in copy_user_handle_tail() will have incremented
before a failure has been noted.  This causes us to skip a byte in the
failure case.

Only do the increment when assured there is no failure.
Signed-off-by: NCQ Tang <cq.tang@intel.com>
Link: http://lkml.kernel.org/r/20130318150221.8439.993.stgit@phlsvslse11.ph.intel.comSigned-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>

66db3feb

18 3月, 2013 3 次提交

perf/x86: Add SNB/SNB-EP scheduling constraints for cycle_activity event · fd4a5aef

由 Stephane Eranian 提交于 3月 17, 2013

Add scheduling constraints for SNB/SNB-EP CYCLE_ACTIVITY event
as defined by SDM Jan 2013 edition. The STALLS umasks are
combinations with the NO_DISPATCH umask.
Signed-off-by: NStephane Eranian <eranian@gmail.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/20130317134957.GA8550@quadSigned-off-by: NIngo Molnar <mingo@kernel.org>

fd4a5aef

kprobes/x86: Check Interrupt Flag modifier when registering probe · 9a556ab9

由 Masami Hiramatsu 提交于 3月 14, 2013

Currently kprobes check whether the copied instruction modifies
IF (interrupt flag) on each probe hit. This results not only in
introducing overhead but also involving
inat_get_opcode_attribute into the kprobes hot path, and it can
cause an infinite recursive call (and kernel panic in the end).

Actually, since the copied instruction itself can never be modified
on the buffer, it is needless to analyze the instruction on every
probe hit.

To fix this issue, we check it only once when registering probe
and store the result on ainsn->if_modifier.
Reported-by: NTimo Juhani Lindfors <timo.lindfors@iki.fi>
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20130314115242.19690.33573.stgit@mhiramat-M0-7522Signed-off-by: NIngo Molnar <mingo@kernel.org>

9a556ab9

perf,x86: fix wrmsr_on_cpu() warning on suspend/resume · 2a6e06b2

由 Linus Torvalds 提交于 3月 17, 2013

Commit 1d9d8639 ("perf,x86: fix kernel crash with PEBS/BTS after
suspend/resume") fixed a crash when doing PEBS performance profiling
after resuming, but in using init_debug_store_on_cpu() to restore the
DS_AREA mtrr it also resulted in a new WARN_ON() triggering.

init_debug_store_on_cpu() uses "wrmsr_on_cpu()", which in turn uses CPU
cross-calls to do the MSR update. Which is not really valid at the
early resume stage, and the warning is quite reasonable. Now, it all
happens to _work_, for the simple reason that smp_call_function_single()
ends up just doing the call directly on the CPU when the CPU number
matches, but we really should just do the wrmsr() directly instead.

This duplicates the wrmsr() logic, but hopefully we can just remove the
wrmsr_on_cpu() version eventually.
Reported-and-tested-by: NParag Warudkar <parag.lkml@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2a6e06b2

16 3月, 2013 1 次提交

perf,x86: fix kernel crash with PEBS/BTS after suspend/resume · 1d9d8639

由 Stephane Eranian 提交于 3月 15, 2013

This patch fixes a kernel crash when using precise sampling (PEBS)
after a suspend/resume. Turns out the CPU notifier code is not invoked
on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
by the kernel and keeps it power-on/resume value of 0 causing any PEBS
measurement to crash when running on CPU0.

The workaround is to add a hook in the actual resume code to restore
the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
the DS_AREA will be restored twice but this is harmless.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1d9d8639

13 3月, 2013 1 次提交

Select VIRT_TO_BUS directly where needed · 4febd95a

由 Stephen Rothwell 提交于 3月 07, 2013

In commit 887cbce0 ("arch Kconfig: centralise ARCH_NO_VIRT_TO_BUS")
I introduced the config sybmol HAVE_VIRT_TO_BUS and selected that where
needed.  I am not sure what I was thinking.  Instead, just directly
select VIRT_TO_BUS where it is needed.
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4febd95a

08 3月, 2013 1 次提交

x86: Do not try to sync identity map for non-mapped pages · 60f583d5

由 Dave Hansen 提交于 3月 07, 2013

kernel_map_sync_memtype() is called from a variety of contexts. The
pat.c code that calls it seems to ensure that it is not called for
non-ram areas by checking via pat_pagerange_is_ram(). It is important
that it only be called on the actual identity map because there *IS*
no map to sync for highmem pages, or for memory holes.

The ioremap.c uses are not as careful as those from pat.c, and call
kernel_map_sync_memtype() on PCI space which is in the middle of the
kernel identity map _range_, but is not actually mapped.

This patch adds a check to kernel_map_sync_memtype() which probably
duplicates some of the checks already in pat.c. But, it is necessary
for the ioremap.c uses and shouldn't hurt other callers.

I have reproduced this bug and this patch fixes it for me and the
original bug reporter:

https://lkml.org/lkml/2013/2/5/396Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20130307163151.D9B58C4E@kernel.stglabs.ibm.comSigned-off-by: NDave Hansen <dave@sr71.net>
Tested-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

60f583d5

07 3月, 2013 1 次提交

x86, doc: Be explicit about what the x86 struct boot_params requires · 3c4aff6b

由 Peter Jones 提交于 3月 06, 2013

If the sentinel triggers, we do not want the boot loader authors to
just poke it and make the error go away, we want them to actually fix
the problem.

This should help avoid making the incorrect change in non-compliant
bootloaders.

[ hpa: dropped the Documentation/x86/boot.txt hunk pending
  clarifications ]
Signed-off-by: NPeter Jones <pjones@redhat.com>
Link: http://lkml.kernel.org/r/1362592823-28967-1-git-send-email-pjones@redhat.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

3c4aff6b

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功