提交 · a8a93f3f03b7a8008d720e8d91798efe599d416c · openeuler / raspberrypi-kernel

22 2月, 2009 1 次提交

x86, mm: fault.c, simplify kmmio_fault(), cleanup · b319eed0

由 Ingo Molnar 提交于 2月 22, 2009

Clarify the kmmio_fault() comment.
Acked-by: NPekka Paalanen <pq@iki.fi>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b319eed0

21 2月, 2009 15 次提交

I
x86, mm: fault.c, update copyrights · f8eeb2e6
由 Ingo Molnar 提交于 2月 20, 2009
```
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
f8eeb2e6

x86, mm: fault.c, give another attempt at prefetch handing before SIGBUS · cd1b68f0

由 Ingo Molnar 提交于 2月 20, 2009

Impact: extend prefetch handling on 64-bit

Currently there's an extra is_prefetch() check done in do_sigbus(),
which we only do on 32 bits.

This is a last-ditch check before we terminate a task, so it's worth
giving prefetch instructions another chance - should none of our
existing quirks have caught a prefetch instruction related spurious
fault.

The only risk is if a prefetch causes a real sigbus, in that case
we'll not OOM but try another fault. But this code has been on
32-bit for a long time, so it should be fine in practice.

So do this on 64-bit too - and thus remove one more #ifdef.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cd1b68f0

x86, mm: fault.c, remove #ifdef from fault_in_kernel_space() · 7c178a26

由 Ingo Molnar 提交于 2月 20, 2009

Impact: cleanup

Removal of an #ifdef in fault_in_kernel_space(), by making
use of the new TASK_SIZE_MAX symbol which is now available
on 32-bit too.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7c178a26

x86, mm: rename TASK_SIZE64 => TASK_SIZE_MAX · d9517346

由 Ingo Molnar 提交于 2月 20, 2009

Impact: cleanup

Rename TASK_SIZE64 to TASK_SIZE_MAX, and provide the
define on 32-bit too. (mapped to TASK_SIZE)

This allows 32-bit code to make use of the (former-) TASK_SIZE64
symbol as well, in a clean way.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d9517346

x86, mm: fault.c, remove #ifdef from do_page_fault() · c3731c68

由 Ingo Molnar 提交于 2月 20, 2009

Impact: cleanup

do_page_fault() has this ugly #ifdef in its prototype:

  #ifdef CONFIG_X86_64
  asmlinkage
  #endif
  void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)

Replace it with 'dotraplinkage' which maps to exactly the above
construct: nothing on 32-bit and asmlinkage on 64-bit.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c3731c68

x86, mm: fault.c, unify oops handling · 1cc99544

由 Ingo Molnar 提交于 2月 20, 2009

Impact: add oops-recursion check to 32-bit

Unify the oops state-machine, to the 64-bit version. It is
slightly more careful in that it does a recursion check
in oops_begin(), and is thus more likely to show the relevant
oops.

It also means that 32-bit will print one more line at the
end of pagefault triggered oopses:

 	printk(KERN_EMERG "CR2: %016lx\n", address);

Which is generally good information to be seen in partial-dump
digital-camera jpegs ;-)

The downside is the somewhat more complex critical path. Both
variants have been tested well meanwhile by kernel developers
crashing their boxes so i dont think this is a practical worry.

This removes 3 ugly #ifdefs from no_context() and makes the
function a lot nicer read.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1cc99544

x86, mm: fault.c, unify oops printing · 8f766149

由 Ingo Molnar 提交于 2月 20, 2009

Impact: refine/extend page fault related oops printing on 64-bit

 - honor the pause_on_oops logic on 64-bit too
 - print out NX fault warnings on 64-bit as well
 - factor out the NX fault message to make it git-greppable and readable

Note that this means that we do the PF_INSTR check on 32-bit non-PAE
as well where it should not occur ... normally. Cannot hurt.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8f766149

x86, mm: fault.c, reorder functions · f2f13a85

由 Ingo Molnar 提交于 2月 20, 2009

Impact: cleanup

Avoid a couple more #ifdefs by moving fundamentally non-unifiable
functions into a single #ifdef 32-bit / #else / #endif block in
fault.c: vmalloc*(), dump_pagetable(), check_vm8086_mode().

No code changed:

   text	   data	    bss	    dec	    hex	filename
   4618	     32	     24	   4674	   1242	fault.o.before
   4618	     32	     24	   4674	   1242	fault.o.after

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f2f13a85

x86, mm, kprobes: fault.c, simplify notify_page_fault() · b1801812

由 Ingo Molnar 提交于 2月 20, 2009

Impact: cleanup

Remove an #ifdef from notify_page_fault(). The function still
compiles to nothing in the !CONFIG_KPROBES case.

Introduce kprobes_built_in() and kprobe_fault_handler() helpers
to allow this - they returns 0 if !CONFIG_KPROBES.

No code changed:

   text	   data	    bss	    dec	    hex	filename
   4618	     32	     24	   4674	   1242	fault.o.before
   4618	     32	     24	   4674	   1242	fault.o.after

Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b1801812

x86, mm: fault.c, simplify kmmio_fault() · b814d41f

由 Ingo Molnar 提交于 2月 20, 2009

Impact: cleanup

Remove an #ifdef from kmmio_fault() - we can do this by
providing default implementations for is_kmmio_active()
and kmmio_handler(). The compiler optimizes it all away
in the !CONFIG_MMIOTRACE case.

Also, while at it, clean up mmiotrace.h a bit:

 - standard header guards
 - standard vertical spaces for structure definitions

No code changed (both with mmiotrace on and off in the config):

   text	   data	    bss	    dec	    hex	filename
   2947	     12	     12	   2971	    b9b	fault.o.before
   2947	     12	     12	   2971	    b9b	fault.o.after

Cc: Pekka Paalanen <pq@iki.fi>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b814d41f

x86, mm: fault.c, enable PF_RSVD checks on 32-bit too · 121d5d0a

由 Ingo Molnar 提交于 2月 20, 2009

Impact: improve page fault handling robustness

The 'PF_RSVD' flag (bit 3) of the page-fault error_code is a
relatively recent addition to x86 CPUs, so the 32-bit do_fault()
implementation never had it. This flag gets set when the CPU
detects nonzero values in any reserved bits of the page directory
entries.

Extend the existing 64-bit check for PF_RSVD in do_page_fault()
to 32-bit too. If we detect such a fault then we print a more
informative oops and the pagetables.

This unifies the code some more, removes an ugly #ifdef and improves
the 32-bit page fault code robustness a bit. It slightly increases
the 32-bit kernel text size.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

121d5d0a

x86, mm: fault.c, factor out the vm86 fault check · 8c938f9f

由 Ingo Molnar 提交于 2月 20, 2009

Impact: cleanup

Instead of an ugly, open-coded, #ifdef-ed vm86 related legacy check
in do_page_fault(), put it into the check_v8086_mode() helper
function and merge it with an existing #ifdef.

Also, simplify the code flow a tiny bit in the helper.

No code changed:

arch/x86/mm/fault.o:

   text	   data	    bss	    dec	    hex	filename
   2711	     12	     12	   2735	    aaf	fault.o.before
   2711	     12	     12	   2735	    aaf	fault.o.after

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8c938f9f

x86, mm: fault.c, refactor/simplify the is_prefetch() code · 107a0367

由 Ingo Molnar 提交于 2月 20, 2009

Impact: no functionality changed

Factor out the opcode checker into a helper inline.

The code got a tiny bit smaller:

   text	   data	    bss	    dec	    hex	filename
   4632	     32	     24	   4688	   1250	fault.o.before
   4618	     32	     24	   4674	   1242	fault.o.after

And it got cleaner / easier to review as well.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

107a0367

x86, mm: fault.c cleanup · 2d4a7167

由 Ingo Molnar 提交于 2月 20, 2009

Impact: cleanup, no code changed

Clean up various small details, which can be correctness checked
automatically:

 - tidy up the include file section
 - eliminate unnecessary includes
 - introduce show_signal_msg() to clean up code flow
 - standardize the code flow
 - standardize comments and other style details
 - more cleanups, pointed out by checkpatch

No code changed on either 32-bit nor 64-bit:

arch/x86/mm/fault.o:

   text	   data	    bss	    dec	    hex	filename
   4632	     32	     24	   4688	   1250	fault.o.before
   4632	     32	     24	   4688	   1250	fault.o.after

the md5 changed due to a change in a single instruction:

   2e8a8241e7f0d69706776a5a26c90bc0  fault.o.before.asm
   c5c3d36e725586eb74f0e10692f0193e  fault.o.after.asm

Because a __LINE__ reference in a WARN_ONCE() has changed.

On 32-bit a few stack offsets changed - no code size difference
nor any functionality difference.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2d4a7167

x86: check PMD in spurious_fault handler · 3c3e5694

由 Steven Rostedt 提交于 2月 19, 2009

Impact: fix to prevent hard lockup on bad PMD permissions

If the PMD does not have the correct permissions for a page access,
but the PTE does, the spurious fault handler will mistake the fault
as a lazy TLB transaction. This will result in an infinite loop of:

 fault -> spurious_fault check (pass) -> return to code -> fault

This patch adds a check and a warn on if the PTE passes the permissions
but the PMD does not.

[ Updated: Ingo Molnar suggested using WARN_ONCE with some text ]
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>

3c3e5694

06 2月, 2009 1 次提交

prevent kprobes from catching spurious page faults · 9be260a6

由 Masami Hiramatsu 提交于 2月 05, 2009

Prevent kprobes from catching spurious faults which will cause infinite
recursive page-fault and memory corruption by stack overflow.
Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
Cc: <stable@kernel.org>		[2.6.28.x]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9be260a6

05 2月, 2009 1 次提交

x86: mm: introduce helper function in fault.c · 0973a06c

由 Hiroshi Shimamoto 提交于 2月 04, 2009

Impact: cleanup

Introduce helper function fault_in_kernel_address() to make editors happy.
Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

0973a06c

29 1月, 2009 1 次提交

x86: add might_sleep() to do_page_fault() · 01006074

由 Peter Zijlstra 提交于 1月 29, 2009

Impact: widen debug checks

VirtualBox calls do_page_fault() from an atomic context but runs into a
might_sleep() way pas this point, cure that.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

01006074

22 1月, 2009 1 次提交

x86: optimise page fault entry, cleanup · fb746d0e

由 Johannes Weiner 提交于 1月 21, 2009

tsk is already assigned to current, drop the redundant second
assignment.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fb746d0e

20 1月, 2009 1 次提交

x86: optimise x86's do_page_fault (C entry point for the page fault path) · 92181f19

由 Nick Piggin 提交于 1月 20, 2009

Impact: cleanup, restructure code to improve assembly

gcc isn't _all_ that smart about spilling registers to stack or reusing
stack slots, even with branch annotations. do_page_fault contained a lot
of functionality, so split unlikely paths into their own functions, and
mark them as noinline just to be sure. I consider this actually to be
somewhat of a cleanup too: the main function now contains about half
the number of lines so the normal path is easier to read, while the error
cases are also nicely split away.

Also, ensure the order of arguments to functions is always the same: regs,
addr, error_code. This can reduce code size a tiny bit, and just looks neater
too.

And add a couple of branch annotations.

Before:
  do_page_fault:
          subq    $360, %rsp      #,

After:
  do_page_fault:
          subq    $56, %rsp       #,

bloat-o-meter:
  add/remove: 8/0 grow/shrink: 0/1 up/down: 2222/-1680 (542)
  function                                     old     new   delta
  __bad_area_nosemaphore                         -     506    +506
  no_context                                     -     474    +474
  vmalloc_fault                                  -     424    +424
  spurious_fault                                 -     358    +358
  mm_fault_error                                 -     272    +272
  bad_area_access_error                          -      89     +89
  bad_area                                       -      89     +89
  bad_area_nosemaphore                           -      10     +10
  do_page_fault                               2464     784   -1680

Yes, the total size increases by 542 bytes, due to the extra function calls.
But these will very rarely be called (except for vmalloc_fault) in a normal
workload. Importantly, do_page_fault is less than 1/3rd it's original size,
and touches far less stack.

Existing gotos and branch hints did move a lot of the infrequently used text
out of the fastpath, but that's even further improved after this patch.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

92181f19

13 1月, 2009 1 次提交

x86: avoid theoretical vmalloc fault loop · f313e123

由 Andi Kleen 提交于 1月 09, 2009

Ajith Kumar noticed:

 I was going through the vmalloc fault handling for x86_64 and am unclear
 about the following lines in the vmalloc_fault() function.

 pgd = pgd_offset(current->mm ?: &init_mm, address);
 pgd_ref = pgd_offset_k(address);

 Here the intention is to get the pgd corresponding to the current process
 and sync it up with the pgd in init_mm(obtained from pgd_offset_k).
 However, for kernel threads current->mm is NULL and hence pgd =
 pgd_offset(init_mm, address) = pgd_ref which means the fault handler
 returns without setting the pgd entry in the MM structure in the context
 of which the kernel thread has faulted.  This could lead to never-ending
 faults and busy looping of kernel threads like pdflush.  So, shouldn't the
 pgd = pgd_offset(current->mm ?: &init_mm, address); be pgd =
 pgd_offset(current->active_mm ?: &init_mm, address);

We can use active_mm unconditionally because it should be always set.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f313e123

07 1月, 2009 1 次提交

mm: invoke oom-killer from page fault · 1c0fe6e3

由 Nick Piggin 提交于 1月 06, 2009

Rather than have the pagefault handler kill a process directly if it gets
a VM_FAULT_OOM, have it call into the OOM killer.

With increasingly sophisticated oom behaviour (cpusets, memory cgroups,
oom killing throttling, oom priority adjustment or selective disabling,
panic on oom, etc), it's silly to unconditionally kill the faulting
process at page fault time.  Create a hook for pagefault oom path to call
into instead.

Only converted x86 and uml so far.

[akpm@linux-foundation.org: make __out_of_memory() static]
[akpm@linux-foundation.org: fix comment]
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Jeff Dike <jdike@addtoit.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1c0fe6e3

14 11月, 2008 1 次提交

CRED: Wrap task credential accesses in the x86 arch · 350b4da7

由 David Howells 提交于 11月 14, 2008

Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NJames Morris <jmorris@namei.org>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

350b4da7

27 10月, 2008 1 次提交

trace: add the MMIO-tracer to the tracer menu, cleanup · fd3fdf11

由 Pekka Paalanen 提交于 10月 24, 2008

Impact: cleanup

We can remove MMIOTRACE_HOOKS and replace it with just MMIOTRACE.
MMIOTRACE_HOOKS is a remnant from the time when I thought that
something else could also use the kmmio facilities.
Signed-off-by: NPekka Paalanen <pq@iki.fi>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fd3fdf11

22 10月, 2008 1 次提交

x86, dumpstack: let signr=0 signal no do_exit · 874d93d1

由 Alexander van Heukelum 提交于 10月 22, 2008

Change oops_end such that signr=0 signals that do_exit
is not to be called.

Currently, each use of __die is soon followed by a call
to oops_end and 'regs' is set to NULL if oops_end is expected
not to call do_exit. Change all such pairs to set signr=0
instead. On x86_64 oops_end is used 'bare' in die_nmi; use
signr=0 instead of regs=NULL there, too.
Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

874d93d1

14 10月, 2008 1 次提交

x86/mm: unify init task OOM handling · 3a1dfe6e

由 Ingo Molnar 提交于 10月 13, 2008

Linus noticed that the "again:" versus "survive:" OOM logic for
the init task was arbitrarily different.

The 64-bit codepath is the better one, because it correctly re-lookups
the vma after having dropped the ->mmap_sem.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>

3a1dfe6e

13 10月, 2008 2 次提交

x86/mm: do not trigger a kernel warning if user-space disables interrupts and... · 891cffbd

由 Linus Torvalds 提交于 10月 12, 2008

x86/mm: do not trigger a kernel warning if user-space disables interrupts and generates a page fault

Arjan reported a spike in the following bug pattern in v2.6.27:

   http://www.kerneloops.org/searchweek.php?search=lock_page

which happens because hwclock started triggering warnings due to
a (correct) might_sleep() check in the MM code.

The warning occurs because hwclock uses this dubious sequence of
code to run "atomic" code:

  static unsigned long
  atomic(const char *name, unsigned long (*op)(unsigned long),
         unsigned long arg)
  {
    unsigned long v;
    __asm__ volatile ("cli");
    v = (*op)(arg);
    __asm__ volatile ("sti");
    return v;
  }

Then it pagefaults in that "atomic" section, triggering the warning.

There is no way the kernel could provide "atomicity" in this path,
a page fault is a cannot-continue machine event so the kernel has to
wait for the page to be filled in.

Even if it was just a minor fault we'd have to take locks and might have
to spend quite a bit of time with interrupts disabled - not nice to irq
latencies in general.

So instead just enable interrupts in the pagefault path unconditionally
if we come from user-space, and handle the fault.

Also, while touching this code, unify some trivial parts of the x86
VM paths at the same time.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reported-by: NArjan van de Ven <arjan@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

891cffbd

traps: x86: remove trace_hardirqs_fixup from pagefault handler · 69c89b5b

由 Alexander van Heukelum 提交于 9月 26, 2008

The last use of trace_hardirqs_fixup is unnecessary, because the
trap is taken with interrupt off on i386 as well as x86_64, and
the irq-tracer is notified of this from the assembly code.

trace_hardirqs_fixup and trace_hardirqs_fixup_flags are removed
from include/asm-x86/irqflags.h as they are no longer used.
Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

69c89b5b

07 9月, 2008 3 次提交

x86: add periodic corruption check · bb577f98

由 Hugh Dickins 提交于 9月 07, 2008

Perodically check for corruption in low phusical memory.  Don't bother
checking at fault time, since it won't show anything useful.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

bb577f98

x86: check for and defend against BIOS memory corruption · 5394f80f

由 Jeremy Fitzhardinge 提交于 9月 07, 2008

Some BIOSes have been observed to corrupt memory in the low 64k.  This
change:
 - Reserves all memory which does not have to be in that area, to
   prevent it from being used as general memory by the kernel.  Things
   like the SMP trampoline are still in the memory, however.
 - Clears the reserved memory so we can observe changes to it.
 - Adds a function check_for_bios_corruption() which checks and reports on
   memory becoming unexpectedly non-zero.  Currently it's called in the
   x86 fault handler, and the powermanagement debug output.
Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5394f80f

x86: adjust vmalloc_sync_all() for Xen (2nd try) · cc643d46

由 Jan Beulich 提交于 8月 29, 2008

Since the fourth PDPT entry cannot be shared under Xen,
vmalloc_sync_all() must iterate over pmd-s rather than pgd-s here.
Luckily, the code isn't used for native PAE (SHARED_KERNEL_PMD is 1)
and the change is benign to non-PAE.

Also do a little more cleanup in that function.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>

cc643d46

23 7月, 2008 1 次提交

x86: mm/fault.c declare do_page_fault before they get used · 70ef5641

由 Jaswinder Singh 提交于 7月 23, 2008

declared do_page_fault() in asm-x86/trap.h for both X86_32 and X86_64

removed do_invalid_op declaration from mm/fault.c as it is already declared in asm-x86/trap.h
Signed-off-by: NJaswinder Singh <jaswinder@infradead.org>

70ef5641

08 7月, 2008 1 次提交

x86: simplify vmalloc_sync_all · 67350a5c

由 Jeremy Fitzhardinge 提交于 6月 25, 2008

vmalloc_sync_all() is only called from register_die_notifier and
alloc_vm_area.  Neither is on any performance-critical paths, so
vmalloc_sync_all() itself is not on any hot paths.

Given that the optimisations in vmalloc_sync_all add a fair amount of
code and complexity, and are fairly hard to evaluate for correctness,
it's better to just remove them to simplify the code rather than worry
about its absolute performance.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

67350a5c

03 7月, 2008 1 次提交

x86: remove unnecessary #ifdef CONFIG_X86_32...#else · 95c60b08

由 Gustavo Fernando Padovan 提交于 6月 25, 2008

Remove the #ifdef conditional because this comparison is already done in
user_mode_vm().
Signed-off-by: NGustavo F. Padovan <gustavo@las.ic.unicamp.br>
Cc: akpm@osdl.org
Signed-off-by: NIngo Molnar <mingo@elte.hu>

95c60b08

01 7月, 2008 1 次提交

x86: small unifications of address printing · f294a8ce

由 Vegard Nossum 提交于 7月 01, 2008

'man 3 printf' tells me that %p should be printed as if by %#x, but
this is not true for the kernel, which does not use the '0x' prefix
for the %p conversion specifier.

A small cast to (void *) is also prettier than #ifdef/#else/#endif.
Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f294a8ce

13 6月, 2008 1 次提交

x86: fix endless page faults in mount_block_root for Linux 2.6 · b29c701d

由 Henry Nestler 提交于 5月 12, 2008

Page faults in kernel address space between PAGE_OFFSET up to
VMALLOC_START should not try to map as vmalloc.

Fix rarely endless page faults inside mount_block_root for root
filesystem at boot time.

All 32bit kernels up to 2.6.25 can fail into this hole.
I can not present this under native linux kernel. I see, that the 64bit
has fixed the problem. I copied the same lines into 32bit part.

Recorded debugs are from coLinux kernel 2.6.22.18 (virtualisation):
http://www.henrynestler.com/colinux/testing/pfn-check-0.7.3/20080410-antinx/bug16-recursive-page-fault-endless.txt
The physicaly memory was trimmed down to 192MB to better catch the bug.
More memory gets the bug more rarely.

Details, how every x86 32bit system can fail:

Start from "mount_block_root",
http://lxr.linux.no/linux/init/do_mounts.c#L297
There the variable "fs_names" got one memory page with 4096 bytes.
Variable "p" walks through the existing file system types. The first
string is no problem.
But, with the second loop in mount_block_root the offset of "p" is not
at beginning of page, the offset is for example +9, if "reiserfs" is the
first in list.
Than calls do_mount_root, and lands in sys_mount.
Remember: Variable "type_page" contains now "fs_type+9" and not contains
a full page.
The sys_mount copies 4096 bytes with function "exact_copy_from_user()":
http://lxr.linux.no/linux/fs/namespace.c#L1540

Mostly exist pages after the buffer "fs_names+4096+9" and the page fault
handler was not called. No problem.

In the case, if the page after "fs_names+4096" is not mapped, the page
fault handler was called from http://lxr.linux.no/linux/fs/namespace.c#L1320

The do_page_fault gots an address 0xc03b4000.
It's kernel address, address >= TASK_SIZE, but not from vmalloc! It's
from "__getname()" alias "kmem_cache_alloc".
The "error_code" is 0. "vmalloc_fault" will be call:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L332

"vmalloc_fault" tryed to find the physical page for a non existing
virtual memory area. The macro "pte_present" in vmalloc_fault()
got a next page fault for 0xc0000ed0 at:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L282

No PTE exist for such virtual address. The page fault handler was trying
to sync the physical page for the PTE lockup.

This called vmalloc_fault() again for address 0xc000000, and that also
was not existing. The endless began...

In normal case the cpu would still loop with disabled interrrupts. Under
coLinux this was catched by a stack overflow inside printk debugs.
Signed-off-by: NHenry Nestler <henry.nestler@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

b29c701d

26 5月, 2008 1 次提交

stackprotector: use canary at end of stack to indicate overruns at oops time · 7c9f8861

由 Eric Sandeen 提交于 4月 22, 2008

(Updated with a common max-stack-used checker that knows about
the canary, as suggested by Joe Perches)

Use a canary at the end of the stack to clearly indicate
at oops time whether the stack has ever overflowed.

This is a very simple implementation with a couple of
drawbacks:

1) a thread may legitimately use exactly up to the last
   word on the stack

 -- but the chances of doing this and then oopsing later seem slim

2) it's possible that the stack usage isn't dense enough
   that the canary location could get skipped over

 -- but the worst that happens is that we don't flag the overrun
 -- though this happens fairly often in my testing :(

With the code in place, an intentionally-bloated stack oops might
do:

BUG: unable to handle kernel paging request at ffff8103f84cc680
IP: [<ffffffff810253df>] update_curr+0x9a/0xa8
PGD 8063 PUD 0
Thread overran stack or stack corrupted
Oops: 0000 [1] SMP
CPU 0
...

... unless the stack overrun is so bad that it corrupts some other
thread.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

7c9f8861

24 5月, 2008 2 次提交

x86: mmiotrace full patch, preview 1 · 0fd0e3da

由 Pekka Paalanen 提交于 5月 12, 2008

kmmio.c handles the list of mmio probes with callbacks, list of traced
pages, and attaching into the page fault handler and die notifier. It
arms, traps and disarms the given pages, this is the core of mmiotrace.

mmio-mod.c is a user interface, hooking into ioremap functions and
registering the mmio probes. It also decodes the required information
from trapped mmio accesses via the pre and post callbacks in each probe.
Currently, hooking into ioremap functions works by redefining the symbols
of the target (binary) kernel module, so that it calls the traced
versions of the functions.

The most notable changes done since the last discussion are:
- kmmio.c is a built-in, not part of the module
- direct call from fault.c to kmmio.c, removing all dynamic hooks
- prepare for unregistering probes at any time
- make kmmio re-initializable and accessible to more than one user
- rewrite kmmio locking to remove all spinlocks from page fault path

Can I abuse call_rcu() like I do in kmmio.c:unregister_kmmio_probe()
or is there a better way?

The function called via call_rcu() itself calls call_rcu() again,
will this work or break? There I need a second grace period for RCU
after the first grace period for page faults.

Mmiotrace itself (mmio-mod.c) is still a module, I am going to attack
that next. At some point I will start looking into how to make mmiotrace
a tracer component of ftrace (thanks for the hint, Ingo). Ftrace should
make the user space part of mmiotracing as simple as
'cat /debug/trace/mmio > dump.txt'.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

0fd0e3da

x86: explicit call to mmiotrace in do_page_fault() · 10c43d2e

由 Pekka Paalanen 提交于 5月 12, 2008

The custom page fault handler list is replaced with a single function
pointer. All related functions and variables are renamed for
mmiotrace.
Signed-off-by: NPekka Paalanen <pq@iki.fi>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: pq@iki.fi
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

10c43d2e