提交 · c9ff03428f24219b927d9d9d3c0c581622967794 · openanolis / cloud-kernel

30 1月, 2008 1 次提交

x86: idle wakeup event in the HLT loop · 5ee613b6

由 Ingo Molnar 提交于 1月 30, 2008

do a proper idle-wakeup event on HLT as well - some CPUs stop the TSC
in HLT too, not just when going through the ACPI methods.

(the ACPI idle code already does this.)

[ update the 64-bit side too, as noticed by Jiri Slaby. ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

5ee613b6

15 1月, 2008 1 次提交

Kick CPUS that might be sleeping in cpus_idle_wait · 40d6a146

由 Steven Rostedt 提交于 1月 14, 2008

Sometimes cpu_idle_wait gets stuck because it might miss CPUS that are
already in idle, have no tasks waiting to run and have no interrupts going
to them.  This is common on bootup when switching cpu idle governors.

This patch gives those CPUS that don't check in an IPI kick.

 Background:
 -----------
I notice this while developing the mcount patches, that every once in a
while the system would hang. Looking deeper, the hang was always at boot
up when registering init_menu of the cpu_idle menu governor. Talking
with Thomas Gliexner, we discovered that one of the CPUS had no timer
events scheduled for it and it was in idle (running with NO_HZ). So the
CPU would not set the cpu_idle_state bit.

Hitting sysrq-t a few times would eventually route the interrupt to the
stuck CPU and the system would continue.

Note, I would have used the PDA isidle but that is set after the
cpu_idle_state bit is cleared, and would leave a window open where we
may miss being kicked.

hmm, looking closer at this, we still have a small race window between
clearing the cpu_idle_state and disabling interrupts (hence the RFC).

    CPU0:                          CPU 1:
  ---------                       ---------
 cpu_idle_wait():                 cpu_idle():
      |                           __cpu_cpu_var(is_idle) = 1;
      |                           if (__get_cpu_var(cpu_idle_state)) /* == 0 */
 per_cpu(cpu_idle_state, 1) = 1;         |
 if (per_cpu(is_idle, 1)) /* == 1 */     |
 smp_call_function(1)                    |
      |                             receives ipi and runs do_nothing.
 wait on map == empty               idle();
   /* waits forever */

So really we need interrupts off for most of this then. One might think
that we could simply clear the cpu_idle_state from do_nothing, but I'm
assuming that cpu_idle governors can be removed, and this might cause a
race that a governor might be used after the module was removed.

Venki said:

  I think your RFC patch is the right solution here.  As I see it, there is
  no race with your RFC patch.  As long as you call a dummy smp_call_function
  on all CPUs, we should be OK.  We can get rid of cpu_idle_state and the
  current wait forever logic altogether with dummy smp_call_function.  And so
  there wont be any wait forever scenario.

  The whole point of cpu_idle_wait() is to make all CPUs come out of idle
  loop atleast once.  The caller will use cpu_idle_wait something like this.

  // Want to change idle handler

  - Switch global idle handler to always present default_idle

  - call cpu_idle_wait so that all cpus come out of idle for an instant
    and stop using old idle pointer and start using default idle

  - Change the idle handler to a new handler

  - optional cpu_idle_wait if you want all cpus to start using the new
    handler immediately.

Maybe the below 1s patch is safe bet for .24.  But for .25, I would say we
just replace all complicated logic by simple dummy smp_call_function and
remove cpu_idle_state altogether.
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

40d6a146

20 12月, 2007 1 次提交

x86_32: select_idle_routine() must be __cpuinit · 3446fa05

由 Adrian Bunk 提交于 12月 19, 2007

CONFIG_HOTPLUG_CPU=y:

WARNING: vmlinux.o(.text+0x1199a): Section mismatch: reference to .init.text.5:select_idle_routine (between 'init_intel' and 'init_nexgen')
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

3446fa05

20 10月, 2007 2 次提交

Use helpers to obtain task pid in printks (arch code) · 19c5870c

由 Alexey Dobriyan 提交于 10月 18, 2007

One of the easiest things to isolate is the pid printed in kernel log.
There was a patch, that made this for arch-independent code, this one makes
so for arch/xxx files.

It took some time to cross-compile it, but hopefully these are all the
printks in arch code.
Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

19c5870c

i386: consolidate show_regs and show_registers for i386 · 9d975ebd

由 Pavel Emelyanov 提交于 10月 19, 2007

Both functions printk the same information, except for CRx and
debug registers in the show_registers() one and a bit different
manner. So move the common code into one place. This is already
done for x86_64, so I think it's worth having the same on i386.

This saves 100 bytes of .rodata section :) ...
   but only 8 from .text :(

[ tglx: arch/x86 adaptation ]
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

9d975ebd

14 10月, 2007 1 次提交

Delete filenames in comments. · 835c34a1

由 Dave Jones 提交于 10月 12, 2007

Since the x86 merge, lots of files that referenced their own filenames
are no longer correct.  Rather than keep them up to date, just delete
them, as they add no real value.

Additionally:
- fix up comment formatting in scx200_32.c
- Remove a credit from myself in setup_64.c from a time when we had no SCM
- remove longwinded history from tsc_32.c which can be figured out from
  git.
Signed-off-by: NDave Jones <davej@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

835c34a1

11 10月, 2007 2 次提交

i386: move kernel · 9a163ed8

由 Thomas Gleixner 提交于 10月 11, 2007

Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9a163ed8

i386: prepare shared kernel/process.c · b4f127b9

由 Thomas Gleixner 提交于 10月 11, 2007

Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b4f127b9

22 7月, 2007 1 次提交

x86: Make Alt-SysRq-p display the debug register contents · bb1995d5

由 Alan Stern 提交于 7月 21, 2007

This patch (as921) adds code to the show_regs() routine in i386 and x86_64
to print the contents of the debug registers along with all the others.
Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
Signed-off-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bb1995d5

17 7月, 2007 1 次提交

make seccomp zerocost in schedule · cf99abac

由 Andrea Arcangeli 提交于 7月 15, 2007

This follows a suggestion from Chuck Ebbert on how to make seccomp
absolutely zerocost in schedule too.  The only remaining footprint of
seccomp is in terms of the bzImage size that becomes a few bytes (perhaps
even a few kbytes) larger, measure it if you care in the embedded.
Signed-off-by: NAndrea Arcangeli <andrea@cpushare.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cf99abac

13 5月, 2007 1 次提交

SLUB: i386 support · f1d1a842

由 Christoph Lameter 提交于 5月 12, 2007

SLUB cannot run on i386 at this point because i386 uses the page->private and
page->index field of slab pages for the pgd cache.

Make SLUB run on i386 by replacing the pgd slab cache with a quicklist.
Limit the changes as much as possible. Leave the improvised linked list in place
etc etc. This has been working here for a couple of weeks now.
Acked-by: NWilliam Lee Irwin III <wli@holomorphy.com>
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f1d1a842

09 5月, 2007 1 次提交

header cleaning: don't include smp_lock.h when not used · e63340ae

由 Randy Dunlap 提交于 5月 08, 2007

Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e63340ae

03 5月, 2007 3 次提交

[PATCH] i386: Convert PDA into the percpu section · 7c3576d2

由 Jeremy Fitzhardinge 提交于 5月 02, 2007

Currently x86 (similar to x84-64) has a special per-cpu structure
called "i386_pda" which can be easily and efficiently referenced via
the %fs register.  An ELF section is more flexible than a structure,
allowing any piece of code to use this area.  Indeed, such a section
already exists: the per-cpu area.

So this patch:
(1) Removes the PDA and uses per-cpu variables for each current member.
(2) Replaces the __KERNEL_PDA segment with __KERNEL_PERCPU.
(3) Creates a per-cpu mirror of __per_cpu_offset called this_cpu_off, which
    can be used to calculate addresses for this CPU's variables.
(4) Simplifies startup, because %fs doesn't need to be loaded with a
    special segment at early boot; it can be deferred until the first
    percpu area is allocated (or never for UP).

The result is less code and one less x86-specific concept.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>

7c3576d2

[PATCH] i386: i386 separate hardware-defined TSS from Linux additions · a75c54f9

由 Rusty Russell 提交于 5月 02, 2007

On Thu, 2007-03-29 at 13:16 +0200, Andi Kleen wrote:
> Please clean it up properly with two structs.

Not sure about this, now I've done it.  Running it here.

If you like it, I can do x86-64 as well.

==
lguest defines its own TSS struct because the "struct tss_struct"
contains linux-specific additions.  Andi asked me to split the struct
in processor.h.

Unfortunately it makes usage a little awkward.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAndi Kleen <ak@suse.de>

a75c54f9

[PATCH] x86: Don't use MWAIT on AMD Family 10 · f039b754

由 Andi Kleen 提交于 5月 02, 2007

It doesn't put the CPU into deeper sleep states, so it's better to use the standard
idle loop to save power. But allow to reenable it anyways for benchmarking.

I also removed the obsolete idle=halt on i386

Cc: andreas.herrmann@amd.com
Signed-off-by: NAndi Kleen <ak@suse.de>

f039b754

27 2月, 2007 1 次提交

Revert "[PATCH] i386: add idle notifier" · ea3d5226

由 Linus Torvalds 提交于 2月 26, 2007

This reverts commit 2ff2d3d7.

Uwe Bugla reports that he cannot mount a floppy drive any more, and Jiri
Slaby bisected it down to this commit.

Benjamin LaHaise also points out that this is a big hot-path, and that
interrupt delivery while idle is very common and should not go through
all these expensive gyrations.

Fix up conflicts in arch/i386/kernel/apic.c and arch/i386/kernel/irq.c
due to other unrelated irq changes.

Cc: Stephane Eranian <eranian@hpl.hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Uwe Bugla <uwe.bugla@gmx.de>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ea3d5226

17 2月, 2007 1 次提交

[PATCH] i386 prepare for dyntick · 74167347

由 Ingo Molnar 提交于 2月 16, 2007

Prepare i386 for dyntick: idle handler callbacks.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

74167347

13 2月, 2007 4 次提交

[PATCH] i386: add idle notifier · 2ff2d3d7

由 Stephane Eranian 提交于 2月 13, 2007

Add a notifier mechanism to the low level idle loop.  You can register a
callback function which gets invoked on entry and exit from the low level idle
loop.  The low level idle loop is defined as the polling loop, low-power call,
or the mwait instruction.  Interrupts processed by the idle thread are not
considered part of the low level loop.

The notifier can be used to measure precisely how much is spent in useless
execution (or low power mode).  The perfmon subsystem uses it to turn on/off
monitoring.
Signed-off-by: Nstephane eranian <eranian@hpl.hp.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NAndi Kleen <ak@suse.de>

2ff2d3d7

[PATCH] i386: iOPL handling for paravirt guests · 8b151144

由 Zachary Amsden 提交于 2月 13, 2007

I found a clever way to make the extra IOPL switching invisible to
non-paravirt compiles - since kernel_rpl is statically defined to be zero
there, and only non-zero rpl kernel have a problem restoring IOPL, as popf
does not restore IOPL flags unless run at CPL-0.
Signed-off-by: NZachary Amsden <zach@vmware.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>

8b151144

[PATCH] i386: paravirt CPU hypercall batching mode · 9226d125

由 Zachary Amsden 提交于 2月 13, 2007

The VMI ROM has a mode where hypercalls can be queued and batched.  This turns
out to be a significant win during context switch, but must be done at a
specific point before side effects to CPU state are visible to subsequent
instructions.  This is similar to the MMU batching hooks already provided.
The same hooks could be used by the Xen backend to implement a context switch
multicall.

To explain a bit more about lazy modes in the paravirt patches, basically, the
idea is that only one of lazy CPU or MMU mode can be active at any given time.
 Lazy MMU mode is similar to this lazy CPU mode, and allows for batching of
multiple PTE updates (say, inside a remap loop), but to avoid keeping some
kind of state machine about when to flush cpu or mmu updates, we just allow
one or the other to be active.  Although there is no real reason a more
comprehensive scheme could not be implemented, there is also no demonstrated
need for this extra complexity.
Signed-off-by: NZachary Amsden <zach@vmware.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>

9226d125

[PATCH] i386: Convert i386 PDA code to use %fs · 464d1a78

由 Jeremy Fitzhardinge 提交于 2月 13, 2007

Convert the PDA code to use %fs rather than %gs as the segment for
per-processor data.  This is because some processors show a small but
measurable performance gain for reloading a NULL segment selector (as %fs
generally is in user-space) versus a non-NULL one (as %gs generally is).

On modern processors the difference is very small, perhaps undetectable.
Some old AMD "K6 3D+" processors are noticably slower when %fs is used
rather than %gs; I have no idea why this might be, but I think they're
sufficiently rare that it doesn't matter much.

This patch also fixes the math emulator, which had not been adjusted to
match the changed struct pt_regs.

[frederik.deweerdt@gmail.com: fixit with gdb]
[mingo@elte.hu: Fix KVM too]
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Ian Campbell <Ian.Campbell@XenSource.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NZachary Amsden <zach@vmware.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NFrederik Deweerdt <frederik.deweerdt@gmail.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>

464d1a78

23 12月, 2006 1 次提交

[PATCH] sched: fix bad missed wakeups in the i386, x86_64, ia64, ACPI and APM idle code · 0888f06a

由 Ingo Molnar 提交于 12月 22, 2006

Fernando Lopez-Lezcano reported frequent scheduling latencies and audio
xruns starting at the 2.6.18-rt kernel, and those problems persisted all
until current -rt kernels. The latencies were serious and unjustified by
system load, often in the milliseconds range.

After a patient and heroic multi-month effort of Fernando, where he
tested dozens of kernels, tried various configs, boot options,
test-patches of mine and provided latency traces of those incidents, the
following 'smoking gun' trace was captured by him:

                 _------=> CPU#
                / _-----=> irqs-off
               | / _----=> need-resched
               || / _---=> hardirq/softirq
               ||| / _--=> preempt-depth
               |||| /
               |||||     delay
   cmd     pid ||||| time  |   caller
      \   /    |||||   \   |   /
  IRQ_19-1479  1D..1    0us : __trace_start_sched_wakeup (try_to_wake_up)
  IRQ_19-1479  1D..1    0us : __trace_start_sched_wakeup <<...>-5856> (37 0)
  IRQ_19-1479  1D..1    0us : __trace_start_sched_wakeup (c01262ba 0 0)
  IRQ_19-1479  1D..1    0us : resched_task (try_to_wake_up)
  IRQ_19-1479  1D..1    0us : __spin_unlock_irqrestore (try_to_wake_up)
  ...
  <idle>-0     1...1   11us!: default_idle (cpu_idle)
  ...
  <idle>-0     0Dn.1  602us : smp_apic_timer_interrupt (c0103baf 1 0)
  ...
   <...>-5856  0D..2  618us : __switch_to (__schedule)
   <...>-5856  0D..2  618us : __schedule <<idle>-0> (20 162)
   <...>-5856  0D..2  619us : __spin_unlock_irq (__schedule)
   <...>-5856  0...1  619us : trace_stop_sched_switched (__schedule)
   <...>-5856  0D..1  619us : trace_stop_sched_switched <<...>-5856> (37 0)

what is visible in this trace is that CPU#1 ran try_to_wake_up() for
PID:5856, it placed PID:5856 on CPU#0's runqueue and ran resched_task()
for CPU#0. But it decided to not send an IPI that no CPU - due to
TS_POLLING. But CPU#0 never woke up after its NEED_RESCHED bit was set,
and only rescheduled to PID:5856 upon the next lapic timer IRQ. The
result was a 600+ usecs latency and a missed wakeup!

the bug turned out to be an idle-wakeup bug introduced into the mainline
kernel this summer via an optimization in the x86_64 tree:

    commit 495ab9c0
    Author: Andi Kleen <ak@suse.de>
    Date:   Mon Jun 26 13:59:11 2006 +0200

    [PATCH] i386/x86-64/ia64: Move polling flag into thread_info_status

    During some profiling I noticed that default_idle causes a lot of
    memory traffic. I think that is caused by the atomic operations
    to clear/set the polling flag in thread_info. There is actually
    no reason to make this atomic - only the idle thread does it
    to itself, other CPUs only read it. So I moved it into ti->status.

the problem is this type of change:

        if (!hlt_counter && boot_cpu_data.hlt_works_ok) {
-               clear_thread_flag(TIF_POLLING_NRFLAG);
+               current_thread_info()->status &= ~TS_POLLING;
                smp_mb__after_clear_bit();
                while (!need_resched()) {
                        local_irq_disable();

this changes clear_thread_flag() to an explicit clearing of TS_POLLING.
clear_thread_flag() is defined as:

        clear_bit(flag, &ti->flags);

and clear_bit() is a LOCK-ed atomic instruction on all x86 platforms:

  static inline void clear_bit(int nr, volatile unsigned long * addr)
  {
          __asm__ __volatile__( LOCK_PREFIX
                  "btrl %1,%0"

hence smp_mb__after_clear_bit() is defined as a simple compile barrier:

  #define smp_mb__after_clear_bit()       barrier()

but the explicit TS_POLLING clearing introduced by the patch:

+               current_thread_info()->status &= ~TS_POLLING;

is not an atomic op! So the clearing of the TS_POLLING bit is freely
reorderable with the reading of the NEED_RESCHED bit - and both now
reside in different memory addresses.

CPU idle wakeup very much depends on ordered memory ops, the clearing of
the TS_POLLING flag must always be done before we test need_resched()
and hit the idle instruction(s). [Symmetrically, the wakeup code needs
to set NEED_RESCHED before it tests the TS_POLLING flag, so memory
ordering is paramount.]

Fernando's dual-core Athlon64 system has a sufficiently advanced memory
ordering model so that it triggered this scenario very often.

( And it also turned out that the reason why these latencies never
  triggered on my testsystems is that i routinely use idle=poll, which
  was the only idle variant not affected by this bug. )

The fix is to change the smp_mb__after_clear_bit() to an smp_mb(), to
act as an absolute barrier between the TS_POLLING write and the
NEED_RESCHED read. This affects almost all idling methods (default,
ACPI, APM), on all 3 x86 architectures: i386, x86_64, ia64.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Tested-by: NFernando Lopez-Lezcano <nando@ccrma.Stanford.EDU>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

0888f06a

07 12月, 2006 6 次提交

[PATCH] i386: remove IOPL check on task switch · 8c898126

由 Chuck Ebbert 提交于 12月 07, 2006

IOPL is implicitly saved and restored on task switch,
so explicit check is no longer needed.
Signed-off-by: NChuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: NAndi Kleen <ak@suse.de>

8c898126

[PATCH] x86: Don't use nested idle loops · 72690a21

由 Andi Kleen 提交于 12月 07, 2006

Currently the idle loop has two nested loops -- one high level
in cpu_idle and in some low level idle functions another one.

Looping in the low level idle functions breaks the idle notifiers
because interrupts waking up sleep states need to execute
exit_idle() which is only in cpu_idle().

So don't do that, only loop in cpu_idle(). This only removes
code.

In some cases e.g. poll_idle the idle loop is a little longer
now because cpu_idle checks more things. I hope that isn't a problem
ACPI idle doesn't change behaviour because it never looped anyways.

Cc: len.brown@intel.com
Cc: eranian@hpl.hp.com
Signed-off-by: NAndi Kleen <ak@suse.de>

72690a21

[PATCH] i386: Implement "current" with the PDA · ec7fcaab

由 Jeremy Fitzhardinge 提交于 12月 07, 2006

Use the pcurrent field in the PDA to implement the "current" macro.  This ends
up compiling down to a single instruction to get the current task.
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>

ec7fcaab

[PATCH] i386: Fix places where using %gs changes the usermode ABI · 66e10a44

由 Jeremy Fitzhardinge 提交于 12月 07, 2006

There are a few places where the change in struct pt_regs and the use of %gs
affect the userspace ABI.  These are primarily debugging interfaces where
thread state can be inspected or extracted.
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>

66e10a44

[PATCH] i386: Use %gs as the PDA base-segment in the kernel · f95d47ca

由 Jeremy Fitzhardinge 提交于 12月 07, 2006

This patch is the meat of the PDA change.  This patch makes several related
changes:

1: Most significantly, %gs is now used in the kernel.  This means that on
   entry, the old value of %gs is saved away, and it is reloaded with
   __KERNEL_PDA.

2: entry.S constructs the stack in the shape of struct pt_regs, and this
   is passed around the kernel so that the process's saved register
   state can be accessed.

   Unfortunately struct pt_regs doesn't currently have space for %gs
   (or %fs). This patch extends pt_regs to add space for gs (no space
   is allocated for %fs, since it won't be used, and it would just
   complicate the code in entry.S to work around the space).

3: Because %gs is now saved on the stack like %ds, %es and the integer
   registers, there are a number of places where it no longer needs to
   be handled specially; namely context switch, and saving/restoring the
   register state in a signal context.

4: And since kernel threads run in kernel space and call normal kernel
   code, they need to be created with their %gs == __KERNEL_PDA.
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>

f95d47ca

[PATCH] i386: add sleazy FPU optimization · acc20761

由 Chuck Ebbert 提交于 12月 07, 2006

i386 port of the sLeAZY-fpu feature. Chuck reports that this gives him a +/-
0.4% improvement on his simple benchmark

x86_64 description follows:

Right now the kernel on x86-64 has a 100% lazy fpu behavior: after *every*
context switch a trap is taken for the first FPU use to restore the FPU
context lazily. This is of course great for applications that have very
sporadic or no FPU use (since then you avoid doing the expensive save/restore
all the time). However for very frequent FPU users... you take an extra trap
every context switch.

The patch below adds a simple heuristic to this code: After 5 consecutive
context switches of FPU use, the lazy behavior is disabled and the context
gets restored every context switch. If the app indeed uses the FPU, the trap
is avoided. (the chance of the 6th time slice using FPU after the previous 5
having done so are quite high obviously).

After 256 switches, this is reset and lazy behavior is returned (until there
are 5 consecutive ones again). The reason for this is to give apps that do
longer bursts of FPU use still the lazy behavior back after some time.
Signed-off-by: NChuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NAndi Kleen <ak@suse.de>

acc20761

18 11月, 2006 1 次提交

[PATCH] i386/x86_64: ACPI cpu_idle_wait() fix · dc1829a4

由 Ingo Molnar 提交于 11月 17, 2006

The scheduler on Andreas Friedrich's hyperthreading system stopped
working properly: the scheduler would never move tasks to another CPU!
The lask known working kernel was 2.6.8.

After a couple of attempts to corner the bug, the following smoking gun
was found:

  BIOS reported wrong ACPI idfor the processor
  CPU#1: set_cpus_allowed(), swapper:1, 3 -> 2
   [<c0103bbe>] show_trace_log_lvl+0x34/0x4a
   [<c0103ceb>] show_trace+0x2c/0x2e
   [<c01045f8>] dump_stack+0x2b/0x2d
   [<c0116a77>] set_cpus_allowed+0x52/0xec
   [<c0101d86>] cpu_idle_wait+0x2e/0x100
   [<c0259c57>] acpi_processor_power_exit+0x45/0x58
   [<c0259752>] acpi_processor_remove+0x46/0xea
   [<c025c6fb>] acpi_start_single_object+0x47/0x54
   [<c025cee5>] acpi_bus_register_driver+0xa4/0xd3
   [<c04ab2d7>] acpi_processor_init+0x57/0x77
   [<c01004d7>] init+0x146/0x2fd
   [<c0103a87>] kernel_thread_helper+0x7/0x10

a quick look at cpu_idle_wait() shows how broken that code is
on i386: it changes the init task's affinity map but never
restores it ...

and because all userspace tasks get forked by init, they all
inherited that single-CPU affinity mask. x86_64 cloned this
bug too.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Cc: Andreas Friedrich <andreas.friedrich@fujitsu-siemens.com>
Cc: Wolfgang Erig <Wolfgang.Erig@fujitsu-siemens.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

dc1829a4

22 10月, 2006 1 次提交

[PATCH] x86: Revert new unwind kernel stack termination · 8cf2c519

由 Andi Kleen 提交于 10月 21, 2006

Jan convinced me that it was unnecessary because the assembly stubs do
this already on the stack.

Cc: jbeulich@novell.com
Signed-off-by: NAndi Kleen <ak@suse.de>

8cf2c519

14 10月, 2006 1 次提交

ACPI: Processor native C-states using MWAIT · 991528d7

由 Venkatesh Pallipadi 提交于 9月 25, 2006

Intel processors starting with the Core Duo support
support processor native C-state using the MWAIT instruction.
Refer: Intel Architecture Software Developer's Manual
http://www.intel.com/design/Pentium4/manuals/253668.htm

Platform firmware exports the support for Native C-state to OS using
ACPI _PDC and _CST methods.
Refer: Intel Processor Vendor-Specific ACPI: Interface Specification
http://www.intel.com/technology/iapc/acpi/downloads/302223.htm

With Processor Native C-state, we use 'MWAIT' instruction on the processor
to enter different C-states (C1, C2, C3).  We won't use the special IO
ports to enter C-state and no SMM mode etc required to enter C-state.
Overall this will mean better C-state support.

One major advantage of using MWAIT for all C-states is, with this and
"treat interrupt as break event" feature of MWAIT, we can now get accurate
timing for the time spent in C1, C2, ..  states.
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLen Brown <len.brown@intel.com>

991528d7

06 10月, 2006 1 次提交

[PATCH] x86: Terminate the kernel stacks for the unwinder · 51ec28e1

由 Andi Kleen 提交于 10月 05, 2006

Always make sure RIP/EIP is 0 in the registers stored on the top
of the stack of a kernel thread. This makes sure the unwinder code
won't try a fallback but knows the stack has ended.

AK: this patch is a bit mysterious. in theory they should be terminated
anyways, but it seems to fix at least one crash. Anyways double termination
probably doesn't hurt.
Signed-off-by: NAndi Kleen <ak@suse.de>

51ec28e1

02 10月, 2006 1 次提交

[PATCH] namespaces: utsname: use init_utsname when appropriate · 96b644bd

由 Serge E. Hallyn 提交于 10月 02, 2006

In some places, particularly drivers and __init code, the init utsns is the
appropriate one to use.  This patch replaces those with a the init_utsname
helper.

Changes: Removed several uses of init_utsname().  Hope I picked all the
	right ones in net/ipv4/ipconfig.c.  These are now changed to
	utsname() (the per-process namespace utsname) in the previous
	patch (2/7)

[akpm@osdl.org: CIFS fix]
Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

96b644bd

01 10月, 2006 1 次提交

[PATCH] kmemdup: some users · 52978be6

由 Alexey Dobriyan 提交于 9月 30, 2006

Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

52978be6

26 9月, 2006 3 次提交

[PATCH] i386: Allow a kernel not to be in ring 0 · 78be3706

由 Rusty Russell 提交于 9月 26, 2006

We allow for the fact that the guest kernel may not run in ring 0.  This
requires some abstraction in a few places when setting %cs or checking
privilege level (user vs kernel).

This is Chris' [RFC PATCH 15/33] move segment checks to subarch, except rather
than using #define USER_MODE_MASK which depends on a config option, we use
Zach's more flexible approach of assuming ring 3 == userspace.  I also used
"get_kernel_rpl()" over "get_kernel_cs()" because I think it reads better in
the code...

1) Remove the hardcoded 3 and introduce #define SEGMENT_RPL_MASK 3 2) Add a
get_kernel_rpl() macro, and don't assume it's zero.

And:

Clean up of patch for letting kernel run other than ring 0:

a. Add some comments about the SEGMENT_IS_*_CODE() macros.
b. Add a USER_RPL macro.  (Code was comparing a value to a mask
   in some places and to the magic number 3 in other places.)
c. Add macros for table indicator field and use them.
d. Change the entry.S tests for LDT stack segment to use the macros
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NZachary Amsden <zach@vmware.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NAndi Kleen <ak@suse.de>

78be3706

[PATCH] i386: move kernel_thread_helper into entry.S · 02ba1a32

由 Andi Kleen 提交于 9月 26, 2006

And add proper CFI annotation to it which was previously
impossible. This prevents "stuck" messages by the dwarf2 unwinder
when reaching the top of a kernel stack.

Includes feedback from Jan Beulich

Cc: jbeulich@novell.com
Signed-off-by: NAndi Kleen <ak@suse.de>

02ba1a32

A
[PATCH] i386/x86-64: Don't randomize stack top when no randomization personality is set · c16b63e0
由 Andi Kleen 提交于 9月 26, 2006
```
Based on patch from Frank van Maarseveen <frankvm@frankvm.com>, but
extended.
Signed-off-by: NAndi Kleen <ak@suse.de>
```
c16b63e0

29 7月, 2006 1 次提交

[PATCH] i386: switch_to(): misplaced parentheses · facf0147

由 Chuck Ebbert 提交于 7月 25, 2006

Recent changes in i386 __switch_to() have a misplaced closing
parenthesis causing an unlikely() to terminate early.
Signed-off-by: NChuck Ebbert <76306.1226@compuserve.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

facf0147

10 7月, 2006 1 次提交

[PATCH] i386: use thread_info flags for debug regs and IO bitmaps · b3cf2576

由 Stephane Eranian 提交于 7月 09, 2006

Use thread info flags to track use of debug registers and IO bitmaps.

 - add TIF_DEBUG to track when debug registers are active
 - add TIF_IO_BITMAP to track when I/O bitmap is used
 - modify __switch_to() to use the new TIF flags

Performance tested on Pentium II, ten runs of LMbench context switch
benchmark (smaller is better:)

	before	after
avg	3.65	3.39
min	3.55	3.33
Signed-off-by: NStephane Eranian <eranian@hpl.hp.com>
Signed-off-by: NChuck Ebbert <76306.1226@compuserve.com>
Acked-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

b3cf2576

01 7月, 2006 1 次提交

Remove obsolete #include <linux/config.h> · 6ab3d562

由 Jörn Engel 提交于 6月 30, 2006

Signed-off-by: NJörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

6ab3d562

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功