提交 · cdd6c482c9ff9c55475ee7392ec8f672eddb7be6 · openeuler / raspberrypi-kernel

21 9月, 2009 3 次提交

perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482

由 Ingo Molnar 提交于 9月 21, 2009

Bye-bye Performance Counters, welcome Performance Events!

In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.

Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.

All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)

The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.

Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.

User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)

This patch has been generated via the following script:

  FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

  sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

  for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
  done

  FILES=$(find . -name perf_event.*)

  sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\<event\>/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.

Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.

( NOTE: 'counters' are still the proper terminology when we deal
  with hardware registers - and these sed scripts are a bit
  over-eager in renaming them. I've undone some of that, but
  in case there's something left where 'counter' would be
  better than 'event' we can undo that on an individual basis
  instead of touching an otherwise nicely automated patch. )
Suggested-by: NStephane Eranian <eranian@google.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NPaul Mackerras <paulus@samba.org>
Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cdd6c482

perf_counter: Rename 'event' to event_id/hw_event · dfc65094

由 Ingo Molnar 提交于 9月 21, 2009

In preparation to the renames, to avoid a namespace clash.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

dfc65094

perf_counter: x86: Fix PMU resource leak · a1792cda

由 Peter Zijlstra 提交于 9月 09, 2009

Dave noticed that we leak the PMU resource reservations when we
fail the hardware counter init.
Reported-by: NDavid Miller <davem@davemloft.net>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NDavid Miller <davem@davemloft.net>
LKML-Reference: <1252483487.7746.164.camel@twins>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a1792cda

20 9月, 2009 5 次提交

includecheck fix: x86, cpu/common.c · 5ac76878

由 Jaswinder Singh Rajput 提交于 9月 20, 2009

fix the following 'make includecheck' warning:

  arch/x86/kernel/cpu/common.c: linux/smp.h is included more than once.
Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <1252087783.6385.10.camel@ht.satnam>

5ac76878

includecheck fix: x86, shadow.c · fcf98921

由 Jaswinder Singh Rajput 提交于 9月 20, 2009

fix the following 'make includecheck' warning:

  arch/x86/mm/kmemcheck/shadow.c: linux/module.h is included more than once.
Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <1247065179.4382.51.camel@ht.satnam>

fcf98921

includecheck fix: x86, traps.c · 144374dc

由 Jaswinder Singh Rajput 提交于 9月 20, 2009

fix the following 'make includecheck' warning:

  arch/x86/kernel/traps.c: asm/traps.h is included more than once.
Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <1247065094.4382.49.camel@ht.satnam>

144374dc

Driver-Core: extend devnode callbacks to provide permissions · e454cea2

由 Kay Sievers 提交于 9月 18, 2009

This allows subsytems to provide devtmpfs with non-default permissions
for the device node. Instead of the default mode of 0600, null, zero,
random, urandom, full, tty, ptmx now have a mode of 0666, which allows
non-privileged processes to access standard device nodes in case no
other userspace process applies the expected permissions.

This also fixes a wrong assignment in pktcdvd and a checkpatch.pl complain.
Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

e454cea2

tracing, x86, cpuidle: Move the end point of a C state in the power tracer · 288f023e

由 Arjan van de Ven 提交于 9月 19, 2009

The "end of a C state" trace point currently happens before
the code runs that corrects the TSC for having stopped during idle.

The result of this is that the timestamp of the end-of-C-state event
is garbage on cpus where the TSC stops during idle.

This patch moves the end point of the C state to after the timekeeping
engine of the kernel has been corrected.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: fweisbec@gmail.com
Cc: peterz@infradead.org
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20090919133533.139c2a46@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

288f023e

19 9月, 2009 2 次提交

tracing, perf: Convert the power tracer into an event tracer · 61613521

由 Arjan van de Ven 提交于 9月 17, 2009

This patch converts the existing power tracer into an event tracer,
so that power events (C states and frequency changes) can be
tracked via "perf".

This also removes the perl script that was used to demo the tracer;
its functionality is being replaced entirely with timechart.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20090912130542.6d314860@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

61613521

x86, perf_counter, bts: Optimize BTS overflow handling · 5622f295

由 Markus Metzger 提交于 9月 15, 2009

Draining the BTS buffer on a buffer overflow interrupt takes too
long resulting in a kernel lockup when tracing the kernel.

Restructure perf_counter sampling into sample creation and sample
output.

Prepare a single reference sample for BTS sampling and update the
from and to address fields when draining the BTS buffer. Drain the
entire BTS buffer between a single perf_output_begin() /
perf_output_end() pair.
Signed-off-by: NMarkus Metzger <markus.t.metzger@intel.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090915130023.A16204@sedona.ch.intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5622f295

18 9月, 2009 1 次提交

x86, pat: don't use rb-tree based lookup in reserve_memtype() · dcb73bf4

由 Suresh Siddha 提交于 9月 16, 2009

Recent enhancement of rb-tree based lookup exposed a  bug with the lookup
mechanism in the reserve_memtype() which ensures that there are no conflicting
memtype requests for the memory range.

memtype_rb_search() returns an entry which has a start address <= new start
address. And from here we traverse the linear linked list to check if there
any conflicts with the existing mappings. As the rbtree is based on the
start address of the memory range, it is quite possible that we have several
overlapped mappings whose start address is much less than new requested start
but the end is >= new requested end. This results in conflicting memtype
mappings.

Same bug exists with the old code which uses cached_entry from where
we traverse the linear linked list. But the new rb-tree code exposes this
bug fairly easily.

For now, don't use the memtype_rb_search() and always start the search from
the head of linear linked list in reserve_memtype(). Linear linked list
for most of the systems grow's to few 10's of entries(as we track memory type
of RAM pages using struct page). So we should be ok for now.

We still retain the rbtree and use it to speed up free_memtype() which
doesn't have the same bug(as we know what exactly we are searching for
in free_memtype).

Also use list_for_each_entry_from() in free_memtype() so that we start
the search from rb-tree lookup result.
Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
LKML-Reference: <1253136483.4119.12.camel@sbs-t61.sc.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

dcb73bf4

16 9月, 2009 6 次提交

[CPUFREQ] Fix NULL ptr regression in powernow-k8 · f0adb134

由 Kurt Roeckx 提交于 9月 16, 2009

Fixes bugzilla #13780

From: Kurt Roeckx <kurt@roeckx.be>
Signed-off-by: NDave Jones <davej@redhat.com>

f0adb134

sched: Disable wakeup balancing · 182a85f8

由 Peter Zijlstra 提交于 9月 16, 2009

Sysbench thinks SD_BALANCE_WAKE is too agressive and kbuild doesn't
really mind too much, SD_BALANCE_NEWIDLE picks up most of the
slack.

On a dual socket, quad core, dual thread nehalem system:

sysbench (--num_threads=16):

 SD_BALANCE_WAKE-: 13982 tx/s
 SD_BALANCE_WAKE+: 15688 tx/s

kbuild (-j16):

 SD_BALANCE_WAKE-: 47.648295846  seconds time elapsed   ( +-   0.312% )
 SD_BALANCE_WAKE+: 47.608607360  seconds time elapsed   ( +-   0.026% )

(same within noise)
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

182a85f8

x86: Move get/set_wallclock to x86_platform_ops · 7bd867df

由 Feng Tang 提交于 9月 10, 2009

get/set_wallclock() have already a set of platform dependent
implementations (default, EFI, paravirt). MRST will add another
variant.

Moving them to platform ops simplifies the existing code and minimizes
the effort to integrate new variants.
Signed-off-by: NFeng Tang <feng.tang@intel.com>
LKML-Reference: <new-submission>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

7bd867df

x86, EDAC: Provide function to return NodeId of a CPU · 6a812691

由 Andreas Herrmann 提交于 9月 16, 2009

Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
Acked-by: NH. Peter Anvin <hpa@zytor.com>

6a812691

sched: x86: Name old_perf in a unique way · 7c423e98

由 Peter Zijlstra 提交于 9月 16, 2009

Silly percpu bits don't respect static..
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>

7c423e98

x86: platform: Fix section annotations · 54e2603f

由 Thomas Gleixner 提交于 9月 16, 2009

init_IRQ() and x86_late_time_init() are missing __init annotations.

The x86 platform ops variables are annotated, but the annotation needs
to be put between the variable name and the "=" of the initializer.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

54e2603f

15 9月, 2009 10 次提交

x86: sched: Provide arch implementations using aperf/mperf · 47fe38fc

由 Peter Zijlstra 提交于 9月 02, 2009

APERF/MPERF support for cpu_power.

APERF/MPERF is arch defined to be a relative scale of work capacity
per logical cpu, this is assumed to include SMT and Turbo mode.

APERF/MPERF are specified to both reset to 0 when either counter
wraps, which is highly inconvenient, since that'll give a blimp
when that happens. The manual specifies writing 0 to the counters
after each read, but that's 1) too expensive, and 2) destroys the
possibility of sharing these counters with other users, so we live
with the blimp - the other existing user does too.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

47fe38fc

x86: Add generic aperf/mperf code · 5cbc19a9

由 Peter Zijlstra 提交于 9月 02, 2009

Move some of the aperf/mperf code out from the cpufreq driver
thingy so that other people can enjoy it too.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: cpufreq@vger.kernel.org
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5cbc19a9

x86: Move APERF/MPERF into a X86_FEATURE · a8303aaf

由 Peter Zijlstra 提交于 9月 02, 2009

Move the APERFMPERF capacility into a X86_FEATURE flag so that it
can be used outside of the acpi cpufreq driver.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: cpufreq@vger.kernel.org
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a8303aaf

sched: Reduce forkexec_idx · b8a543ea

由 Peter Zijlstra 提交于 9月 15, 2009

If we're looking to place a new task, we might as well find the
idlest position _now_, not 1 tick ago.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b8a543ea

sched: Improve latencies and throughput · 0ec9fab3

由 Mike Galbraith 提交于 9月 15, 2009

Make the idle balancer more agressive, to improve a
x264 encoding workload provided by Jason Garrett-Glaser:

 NEXT_BUDDY NO_LB_BIAS
 encoded 600 frames, 252.82 fps, 22096.60 kb/s
 encoded 600 frames, 250.69 fps, 22096.60 kb/s
 encoded 600 frames, 245.76 fps, 22096.60 kb/s

 NO_NEXT_BUDDY LB_BIAS
 encoded 600 frames, 344.44 fps, 22096.60 kb/s
 encoded 600 frames, 346.66 fps, 22096.60 kb/s
 encoded 600 frames, 352.59 fps, 22096.60 kb/s

 NO_NEXT_BUDDY NO_LB_BIAS
 encoded 600 frames, 425.75 fps, 22096.60 kb/s
 encoded 600 frames, 425.45 fps, 22096.60 kb/s
 encoded 600 frames, 422.49 fps, 22096.60 kb/s

Peter pointed out that this is better done via newidle_idx,
not via LB_BIAS, newidle balancing should look for where
there is load _now_, not where there was load 2 ticks ago.

Worst-case latencies are improved as well as no buddies
means less vruntime spread. (as per prior lkml discussions)

This change improves kbuild-peak parallelism as well.
Reported-by: NJason Garrett-Glaser <darkshikari@gmail.com>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1253011667.9128.16.camel@marge.simson.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0ec9fab3

sched: Tweak wake_idx · 78e7ed53

由 Peter Zijlstra 提交于 9月 03, 2009

When merging select_task_rq_fair() and sched_balance_self() we lost
the use of wake_idx, restore that and set them to 0 to make wake
balancing more aggressive.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

78e7ed53

sched: Merge select_task_rq_fair() and sched_balance_self() · c88d5910

由 Peter Zijlstra 提交于 9月 10, 2009

The problem with wake_idle() is that is doesn't respect things like
cpu_power, which means it doesn't deal well with SMT nor the recent
RT interaction.

To cure this, it needs to do what sched_balance_self() does, which
leads to the possibility of merging select_task_rq_fair() and
sched_balance_self().

Modify sched_balance_self() to:

  - update_shares() when walking up the domain tree,
    (it only called it for the top domain, but it should
     have done this anyway), which allows us to remove
    this ugly bit from try_to_wake_up().

  - do wake_affine() on the smallest domain that contains
    both this (the waking) and the prev (the wakee) cpu for
    WAKE invocations.

Then use the top-down balance steps it had to replace wake_idle().

This leads to the dissapearance of SD_WAKE_BALANCE and
SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE.

SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective.

Touch all topology bits to replace the old with new SD flags --
platforms might need re-tuning, enabling SD_BALANCE_WAKE
conditionally on a NUMA distance seems like a good additional
feature, magny-core and small nehalem systems would want this
enabled, systems with slow interconnects would not.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c88d5910

x86, mce: Fix compilation with !CONFIG_DEBUG_FS in mce-severity.c · e34e77ce

由 Andi Kleen 提交于 9月 14, 2009

Fix compilation error in arch/x86/kernel/cpu/mcheck/mce-severity.c
when CONFIG_DEBUG_FS is disabled, introduced in commit
5be9ed25.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

e34e77ce

x86, mce: do not compile mcelog message on AMD · 22223c9b

由 Borislav Petkov 提交于 7月 28, 2009

Now that decoding is done in-kernel, suppress mcelog message part.

CC: Andi Kleen <andi@firstfloor.org>
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>

22223c9b

x86, mce: pass mce info to EDAC for decoding · 549d042d

由 Borislav Petkov 提交于 7月 24, 2009

Move NB decoder along with required defines to EDAC MCE core. Add
registration routines for further decoding of the MCE info in the AMD64
EDAC module.

CC: Andi Kleen <andi@firstfloor.org>
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>

549d042d

13 9月, 2009 1 次提交

tracing/function-graph: x86_64 stack allocation cleanup · 4818d809

由 Jiri Olsa 提交于 7月 29, 2009

Only 24 bytes needs to be reserved on the stack for the function graph
tracer on x86_64.
Signed-off-by: NJiri Olsa <jolsa@redhat.com>
LKML-Reference: <20090729085837.GB4998@jolsa.lab.eng.brq.redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

4818d809

11 9月, 2009 4 次提交

x86: Increase MIN_GAP to include randomized stack · 80938332

由 Michal Hocko 提交于 9月 08, 2009

Currently we are not including randomized stack size when calculating
mmap_base address in arch_pick_mmap_layout for topdown case. This might
cause that mmap_base starts in the stack reserved area because stack is
randomized by 1GB for 64b (8MB for 32b) and the minimum gap is 128MB.

If the stack really grows down to mmap_base then we can get silent mmap
region overwrite by the stack values.

Let's include maximum stack randomization size into MIN_GAP which is
used as the low bound for the gap in mmap.
Signed-off-by: NMichal Hocko <mhocko@suse.cz>
LKML-Reference: <1252400515-6866-1-git-send-email-mhocko@suse.cz>
Acked-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: Stable Team <stable@kernel.org>

80938332

x86: Fix code patching for paravirt-alternatives on 486 · 5367b688

由 Ben Hutchings 提交于 9月 10, 2009

As reported in <http://bugs.debian.org/511703> and
<http://bugs.debian.org/515982>, kernels with paravirt-alternatives
enabled crash in text_poke_early() on at least some 486-class
processors.

The problem is that text_poke_early() itself uses inline functions
affected by paravirt-alternatives and so will modify instructions that
have already been prefetched.  Pentium and later processors will
invalidate the prefetched instructions in this case, but 486-class
processors do not.

Change sync_core() to limit prefetching on 486-class (and 386-class)
processors, and move the call to sync_core() above the call to the
modifiable local_irq_restore().
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
LKML-Reference: <1252547631.3423.134.camel@localhost>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

5367b688

x86/tracing: comment need for atomic nop · fc06b852

由 Steven Rostedt 提交于 9月 10, 2009

The dynamic function tracer relys on the macro P6_NOP5 always being
an atomic NOP. If for some reason it is changed to be two operations
(like a nop2 nop3) it can faults within the kernel when the function
tracer modifies the code.

This patch adds a comment to note that the P6_NOPs are expected to
be atomic. This will hopefully prevent anyone from changing that.
Reported-by: NMathieu Desnoyer <mathieu.desnoyers@polymtl.ca>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

fc06b852

x86: split __phys_addr out into separate file · 78c86e5e

由 Jeremy Fitzhardinge 提交于 9月 10, 2009

Split __phys_addr out into its own file so we can disable
-fstack-protector in a fine-grained fashion.  Also it doesn't
have terribly much to do with the rest of ioremap.c.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

78c86e5e

10 9月, 2009 8 次提交

KVM: VMX: Check cpl before emulating debug register access · 0a79b009

由 Avi Kivity 提交于 9月 01, 2009

Debug registers may only be accessed from cpl 0.  Unfortunately, vmx will
code to emulate the instruction even though it was issued from guest
userspace, possibly leading to an unexpected trap later.

Cc: stable@kernel.org
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0a79b009

G
KVM: fix misreporting of coalesced interrupts by kvm tracer · 4da74896
由 Gleb Natapov 提交于 8月 27, 2009
```
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
4da74896

KVM: x86: drop duplicate kvm_flush_remote_tlb calls · e3904e6e

由 Marcelo Tosatti 提交于 9月 08, 2009

kvm_mmu_slot_remove_write_access already calls it.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e3904e6e

KVM: VMX: call vmx_load_host_state() only if msr is cached · 542423b0

由 Gleb Natapov 提交于 8月 27, 2009

No need to call it before each kvm_(set|get)_msr_common()
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

542423b0

KVM: VMX: Conditionally reload debug register 6 · e8a48342

由 Avi Kivity 提交于 9月 01, 2009

Only reload debug register 6 if we're running with the guest's
debug registers.  Saves around 150 cycles from the guest lightweight
exit path.

dr6 contains a couple of bits that are updated on #DB, so intercept
that unconditionally and update those bits then.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e8a48342

KVM: Use thread debug register storage instead of kvm specific data · 3d53c27d

由 Avi Kivity 提交于 9月 01, 2009

Instead of saving the debug registers from the processor to a kvm data
structure, rely in the debug registers stored in the thread structure.
This allows us not to save dr6 and dr7.

Reduces lightweight vmexit cost by 350 cycles, or 11 percent.
Signed-off-by: NAvi Kivity <avi@redhat.com>

3d53c27d

KVM guest: do not batch pte updates from interrupt context · 6ba66178

由 Marcelo Tosatti 提交于 8月 25, 2009

Commit b8bcfe99 made paravirt pte updates synchronous in interrupt
context.

Unfortunately the KVM pv mmu code caches the lazy/nonlazy mode
internally, so a pte update from interrupt context during a lazy mmu
operation can be batched while it should be performed synchronously.

https://bugzilla.redhat.com/show_bug.cgi?id=518022

Drop the internal mode variable and use paravirt_get_lazy_mode(), which
returns the correct state.

Cc: stable@kernel.org
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

6ba66178

KVM guest: fix bogus wallclock physical address calculation · a20316d2

由 Glauber Costa 提交于 8月 31, 2009

The use of __pa() to calculate the address of a C-visible symbol
is wrong, and can lead to unpredictable results. See arch/x86/include/asm/page.h
for details.

It should be replaced with __pa_symbol(), that does the correct math here,
by taking relocations into account.  This ensures the correct wallclock data
structure physical address is passed to the hypervisor.

Cc: stable@kernel.org
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a20316d2