1. 22 6月, 2009 1 次提交
  2. 16 6月, 2009 1 次提交
    • I
      x86: mm: Read cr2 before prefetching the mmap_lock · 5dfaf90f
      Ingo Molnar 提交于
      Prefetch instructions can generate spurious faults on certain
      models of older CPUs. The faults themselves cannot be stopped
      and they can occur pretty much anywhere - so the way we solve
      them is that we detect certain patterns and ignore the fault.
      
      There is one small path of code where we must not take faults
      though: the #PF handler execution leading up to the reading
      of the CR2 (the faulting address). If we take a fault there
      then we destroy the CR2 value (with that of the prefetching
      instruction's) and possibly mishandle user-space or
      kernel-space pagefaults.
      
      It turns out that in current upstream we do exactly that:
      
      	prefetchw(&mm->mmap_sem);
      
      	/* Get the faulting address: */
      	address = read_cr2();
      
      This is not good.
      
      So turn around the order: first read the cr2 then prefetch
      the lock address. Reading cr2 is plenty fast (2 cycles) so
      delaying the prefetch by this amount shouldnt be a big issue
      performance-wise.
      
      [ And this might explain a mystery fault.c warning that sometimes
        occurs on one an old AMD/Semptron based test-system i have -
        which does have such prefetch problems. ]
      
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      LKML-Reference: <20090616030522.GA22162@Krystal>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5dfaf90f
  3. 15 6月, 2009 1 次提交
    • V
      x86: add hooks for kmemcheck · f8561296
      Vegard Nossum 提交于
      The hooks that we modify are:
      - Page fault handler (to handle kmemcheck faults)
      - Debug exception handler (to hide pages after single-stepping
        the instruction that caused the page fault)
      
      Also redefine memset() to use the optimized version if kmemcheck is
      enabled.
      
      (Thanks to Pekka Enberg for minimizing the impact on the page fault
      handler.)
      
      As kmemcheck doesn't handle MMX/SSE instructions (yet), we also disable
      the optimized xor code, and rely instead on the generic C implementation
      in order to avoid false-positive warnings.
      Signed-off-by: NVegard Nossum <vegardno@ifi.uio.no>
      
      [whitespace fixlet]
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      
      [rebased for mainline inclusion]
      Signed-off-by: NVegard Nossum <vegardno@ifi.uio.no>
      f8561296
  4. 11 6月, 2009 1 次提交
  5. 03 5月, 2009 1 次提交
  6. 09 4月, 2009 1 次提交
  7. 06 4月, 2009 2 次提交
  8. 30 3月, 2009 2 次提交
  9. 22 2月, 2009 1 次提交
  10. 21 2月, 2009 15 次提交
    • I
      x86, mm: fault.c, update copyrights · f8eeb2e6
      Ingo Molnar 提交于
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f8eeb2e6
    • I
      x86, mm: fault.c, give another attempt at prefetch handing before SIGBUS · cd1b68f0
      Ingo Molnar 提交于
      Impact: extend prefetch handling on 64-bit
      
      Currently there's an extra is_prefetch() check done in do_sigbus(),
      which we only do on 32 bits.
      
      This is a last-ditch check before we terminate a task, so it's worth
      giving prefetch instructions another chance - should none of our
      existing quirks have caught a prefetch instruction related spurious
      fault.
      
      The only risk is if a prefetch causes a real sigbus, in that case
      we'll not OOM but try another fault. But this code has been on
      32-bit for a long time, so it should be fine in practice.
      
      So do this on 64-bit too - and thus remove one more #ifdef.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cd1b68f0
    • I
      x86, mm: fault.c, remove #ifdef from fault_in_kernel_space() · 7c178a26
      Ingo Molnar 提交于
      Impact: cleanup
      
      Removal of an #ifdef in fault_in_kernel_space(), by making
      use of the new TASK_SIZE_MAX symbol which is now available
      on 32-bit too.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7c178a26
    • I
      x86, mm: rename TASK_SIZE64 => TASK_SIZE_MAX · d9517346
      Ingo Molnar 提交于
      Impact: cleanup
      
      Rename TASK_SIZE64 to TASK_SIZE_MAX, and provide the
      define on 32-bit too. (mapped to TASK_SIZE)
      
      This allows 32-bit code to make use of the (former-) TASK_SIZE64
      symbol as well, in a clean way.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d9517346
    • I
      x86, mm: fault.c, remove #ifdef from do_page_fault() · c3731c68
      Ingo Molnar 提交于
      Impact: cleanup
      
      do_page_fault() has this ugly #ifdef in its prototype:
      
        #ifdef CONFIG_X86_64
        asmlinkage
        #endif
        void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)
      
      Replace it with 'dotraplinkage' which maps to exactly the above
      construct: nothing on 32-bit and asmlinkage on 64-bit.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c3731c68
    • I
      x86, mm: fault.c, unify oops handling · 1cc99544
      Ingo Molnar 提交于
      Impact: add oops-recursion check to 32-bit
      
      Unify the oops state-machine, to the 64-bit version. It is
      slightly more careful in that it does a recursion check
      in oops_begin(), and is thus more likely to show the relevant
      oops.
      
      It also means that 32-bit will print one more line at the
      end of pagefault triggered oopses:
      
       	printk(KERN_EMERG "CR2: %016lx\n", address);
      
      Which is generally good information to be seen in partial-dump
      digital-camera jpegs ;-)
      
      The downside is the somewhat more complex critical path. Both
      variants have been tested well meanwhile by kernel developers
      crashing their boxes so i dont think this is a practical worry.
      
      This removes 3 ugly #ifdefs from no_context() and makes the
      function a lot nicer read.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1cc99544
    • I
      x86, mm: fault.c, unify oops printing · 8f766149
      Ingo Molnar 提交于
      Impact: refine/extend page fault related oops printing on 64-bit
      
       - honor the pause_on_oops logic on 64-bit too
       - print out NX fault warnings on 64-bit as well
       - factor out the NX fault message to make it git-greppable and readable
      
      Note that this means that we do the PF_INSTR check on 32-bit non-PAE
      as well where it should not occur ... normally. Cannot hurt.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8f766149
    • I
      x86, mm: fault.c, reorder functions · f2f13a85
      Ingo Molnar 提交于
      Impact: cleanup
      
      Avoid a couple more #ifdefs by moving fundamentally non-unifiable
      functions into a single #ifdef 32-bit / #else / #endif block in
      fault.c: vmalloc*(), dump_pagetable(), check_vm8086_mode().
      
      No code changed:
      
         text	   data	    bss	    dec	    hex	filename
         4618	     32	     24	   4674	   1242	fault.o.before
         4618	     32	     24	   4674	   1242	fault.o.after
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f2f13a85
    • I
      x86, mm, kprobes: fault.c, simplify notify_page_fault() · b1801812
      Ingo Molnar 提交于
      Impact: cleanup
      
      Remove an #ifdef from notify_page_fault(). The function still
      compiles to nothing in the !CONFIG_KPROBES case.
      
      Introduce kprobes_built_in() and kprobe_fault_handler() helpers
      to allow this - they returns 0 if !CONFIG_KPROBES.
      
      No code changed:
      
         text	   data	    bss	    dec	    hex	filename
         4618	     32	     24	   4674	   1242	fault.o.before
         4618	     32	     24	   4674	   1242	fault.o.after
      
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b1801812
    • I
      x86, mm: fault.c, simplify kmmio_fault() · b814d41f
      Ingo Molnar 提交于
      Impact: cleanup
      
      Remove an #ifdef from kmmio_fault() - we can do this by
      providing default implementations for is_kmmio_active()
      and kmmio_handler(). The compiler optimizes it all away
      in the !CONFIG_MMIOTRACE case.
      
      Also, while at it, clean up mmiotrace.h a bit:
      
       - standard header guards
       - standard vertical spaces for structure definitions
      
      No code changed (both with mmiotrace on and off in the config):
      
         text	   data	    bss	    dec	    hex	filename
         2947	     12	     12	   2971	    b9b	fault.o.before
         2947	     12	     12	   2971	    b9b	fault.o.after
      
      Cc: Pekka Paalanen <pq@iki.fi>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b814d41f
    • I
      x86, mm: fault.c, enable PF_RSVD checks on 32-bit too · 121d5d0a
      Ingo Molnar 提交于
      Impact: improve page fault handling robustness
      
      The 'PF_RSVD' flag (bit 3) of the page-fault error_code is a
      relatively recent addition to x86 CPUs, so the 32-bit do_fault()
      implementation never had it. This flag gets set when the CPU
      detects nonzero values in any reserved bits of the page directory
      entries.
      
      Extend the existing 64-bit check for PF_RSVD in do_page_fault()
      to 32-bit too. If we detect such a fault then we print a more
      informative oops and the pagetables.
      
      This unifies the code some more, removes an ugly #ifdef and improves
      the 32-bit page fault code robustness a bit. It slightly increases
      the 32-bit kernel text size.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      121d5d0a
    • I
      x86, mm: fault.c, factor out the vm86 fault check · 8c938f9f
      Ingo Molnar 提交于
      Impact: cleanup
      
      Instead of an ugly, open-coded, #ifdef-ed vm86 related legacy check
      in do_page_fault(), put it into the check_v8086_mode() helper
      function and merge it with an existing #ifdef.
      
      Also, simplify the code flow a tiny bit in the helper.
      
      No code changed:
      
      arch/x86/mm/fault.o:
      
         text	   data	    bss	    dec	    hex	filename
         2711	     12	     12	   2735	    aaf	fault.o.before
         2711	     12	     12	   2735	    aaf	fault.o.after
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8c938f9f
    • I
      x86, mm: fault.c, refactor/simplify the is_prefetch() code · 107a0367
      Ingo Molnar 提交于
      Impact: no functionality changed
      
      Factor out the opcode checker into a helper inline.
      
      The code got a tiny bit smaller:
      
         text	   data	    bss	    dec	    hex	filename
         4632	     32	     24	   4688	   1250	fault.o.before
         4618	     32	     24	   4674	   1242	fault.o.after
      
      And it got cleaner / easier to review as well.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      107a0367
    • I
      x86, mm: fault.c cleanup · 2d4a7167
      Ingo Molnar 提交于
      Impact: cleanup, no code changed
      
      Clean up various small details, which can be correctness checked
      automatically:
      
       - tidy up the include file section
       - eliminate unnecessary includes
       - introduce show_signal_msg() to clean up code flow
       - standardize the code flow
       - standardize comments and other style details
       - more cleanups, pointed out by checkpatch
      
      No code changed on either 32-bit nor 64-bit:
      
      arch/x86/mm/fault.o:
      
         text	   data	    bss	    dec	    hex	filename
         4632	     32	     24	   4688	   1250	fault.o.before
         4632	     32	     24	   4688	   1250	fault.o.after
      
      the md5 changed due to a change in a single instruction:
      
         2e8a8241e7f0d69706776a5a26c90bc0  fault.o.before.asm
         c5c3d36e725586eb74f0e10692f0193e  fault.o.after.asm
      
      Because a __LINE__ reference in a WARN_ONCE() has changed.
      
      On 32-bit a few stack offsets changed - no code size difference
      nor any functionality difference.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2d4a7167
    • S
      x86: check PMD in spurious_fault handler · 3c3e5694
      Steven Rostedt 提交于
      Impact: fix to prevent hard lockup on bad PMD permissions
      
      If the PMD does not have the correct permissions for a page access,
      but the PTE does, the spurious fault handler will mistake the fault
      as a lazy TLB transaction. This will result in an infinite loop of:
      
       fault -> spurious_fault check (pass) -> return to code -> fault
      
      This patch adds a check and a warn on if the PTE passes the permissions
      but the PMD does not.
      
      [ Updated: Ingo Molnar suggested using WARN_ONCE with some text ]
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      3c3e5694
  11. 06 2月, 2009 1 次提交
  12. 05 2月, 2009 1 次提交
  13. 29 1月, 2009 1 次提交
  14. 22 1月, 2009 1 次提交
  15. 20 1月, 2009 1 次提交
    • N
      x86: optimise x86's do_page_fault (C entry point for the page fault path) · 92181f19
      Nick Piggin 提交于
      Impact: cleanup, restructure code to improve assembly
      
      gcc isn't _all_ that smart about spilling registers to stack or reusing
      stack slots, even with branch annotations. do_page_fault contained a lot
      of functionality, so split unlikely paths into their own functions, and
      mark them as noinline just to be sure. I consider this actually to be
      somewhat of a cleanup too: the main function now contains about half
      the number of lines so the normal path is easier to read, while the error
      cases are also nicely split away.
      
      Also, ensure the order of arguments to functions is always the same: regs,
      addr, error_code. This can reduce code size a tiny bit, and just looks neater
      too.
      
      And add a couple of branch annotations.
      
      Before:
        do_page_fault:
                subq    $360, %rsp      #,
      
      After:
        do_page_fault:
                subq    $56, %rsp       #,
      
      bloat-o-meter:
        add/remove: 8/0 grow/shrink: 0/1 up/down: 2222/-1680 (542)
        function                                     old     new   delta
        __bad_area_nosemaphore                         -     506    +506
        no_context                                     -     474    +474
        vmalloc_fault                                  -     424    +424
        spurious_fault                                 -     358    +358
        mm_fault_error                                 -     272    +272
        bad_area_access_error                          -      89     +89
        bad_area                                       -      89     +89
        bad_area_nosemaphore                           -      10     +10
        do_page_fault                               2464     784   -1680
      
      Yes, the total size increases by 542 bytes, due to the extra function calls.
      But these will very rarely be called (except for vmalloc_fault) in a normal
      workload. Importantly, do_page_fault is less than 1/3rd it's original size,
      and touches far less stack.
      
      Existing gotos and branch hints did move a lot of the infrequently used text
      out of the fastpath, but that's even further improved after this patch.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      92181f19
  16. 13 1月, 2009 1 次提交
    • A
      x86: avoid theoretical vmalloc fault loop · f313e123
      Andi Kleen 提交于
      Ajith Kumar noticed:
      
       I was going through the vmalloc fault handling for x86_64 and am unclear
       about the following lines in the vmalloc_fault() function.
      
       pgd = pgd_offset(current->mm ?: &init_mm, address);
       pgd_ref = pgd_offset_k(address);
      
       Here the intention is to get the pgd corresponding to the current process
       and sync it up with the pgd in init_mm(obtained from pgd_offset_k).
       However, for kernel threads current->mm is NULL and hence pgd =
       pgd_offset(init_mm, address) = pgd_ref which means the fault handler
       returns without setting the pgd entry in the MM structure in the context
       of which the kernel thread has faulted.  This could lead to never-ending
       faults and busy looping of kernel threads like pdflush.  So, shouldn't the
       pgd = pgd_offset(current->mm ?: &init_mm, address); be pgd =
       pgd_offset(current->active_mm ?: &init_mm, address);
      
      We can use active_mm unconditionally because it should be always set.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f313e123
  17. 07 1月, 2009 1 次提交
    • N
      mm: invoke oom-killer from page fault · 1c0fe6e3
      Nick Piggin 提交于
      Rather than have the pagefault handler kill a process directly if it gets
      a VM_FAULT_OOM, have it call into the OOM killer.
      
      With increasingly sophisticated oom behaviour (cpusets, memory cgroups,
      oom killing throttling, oom priority adjustment or selective disabling,
      panic on oom, etc), it's silly to unconditionally kill the faulting
      process at page fault time.  Create a hook for pagefault oom path to call
      into instead.
      
      Only converted x86 and uml so far.
      
      [akpm@linux-foundation.org: make __out_of_memory() static]
      [akpm@linux-foundation.org: fix comment]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Jeff Dike <jdike@addtoit.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c0fe6e3
  18. 14 11月, 2008 1 次提交
  19. 27 10月, 2008 1 次提交
  20. 22 10月, 2008 1 次提交
  21. 14 10月, 2008 1 次提交
  22. 13 10月, 2008 2 次提交
    • L
      x86/mm: do not trigger a kernel warning if user-space disables interrupts and... · 891cffbd
      Linus Torvalds 提交于
      x86/mm: do not trigger a kernel warning if user-space disables interrupts and generates a page fault
      
      Arjan reported a spike in the following bug pattern in v2.6.27:
      
         http://www.kerneloops.org/searchweek.php?search=lock_page
      
      which happens because hwclock started triggering warnings due to
      a (correct) might_sleep() check in the MM code.
      
      The warning occurs because hwclock uses this dubious sequence of
      code to run "atomic" code:
      
        static unsigned long
        atomic(const char *name, unsigned long (*op)(unsigned long),
               unsigned long arg)
        {
          unsigned long v;
          __asm__ volatile ("cli");
          v = (*op)(arg);
          __asm__ volatile ("sti");
          return v;
        }
      
      Then it pagefaults in that "atomic" section, triggering the warning.
      
      There is no way the kernel could provide "atomicity" in this path,
      a page fault is a cannot-continue machine event so the kernel has to
      wait for the page to be filled in.
      
      Even if it was just a minor fault we'd have to take locks and might have
      to spend quite a bit of time with interrupts disabled - not nice to irq
      latencies in general.
      
      So instead just enable interrupts in the pagefault path unconditionally
      if we come from user-space, and handle the fault.
      
      Also, while touching this code, unify some trivial parts of the x86
      VM paths at the same time.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: NArjan van de Ven <arjan@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      891cffbd
    • A
      traps: x86: remove trace_hardirqs_fixup from pagefault handler · 69c89b5b
      Alexander van Heukelum 提交于
      The last use of trace_hardirqs_fixup is unnecessary, because the
      trap is taken with interrupt off on i386 as well as x86_64, and
      the irq-tracer is notified of this from the assembly code.
      
      trace_hardirqs_fixup and trace_hardirqs_fixup_flags are removed
      from include/asm-x86/irqflags.h as they are no longer used.
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      69c89b5b
  23. 07 9月, 2008 1 次提交