1. 11 10月, 2007 1 次提交
  2. 20 9月, 2007 1 次提交
    • L
      x86-64: page faults from user mode are always user faults · dbe3ed1c
      Linus Torvalds 提交于
      Randy Dunlap noticed an interesting "crashme" behaviour on his dual
      Prescott Xeon setup, where he gets page faults with the error code
      having a zero "user" bit, but the register state points back to user
      mode.
      
      This may be a CPU microcode buglet triggered by some strange instruction
      pattern that crashme generates, and loading a microcode update seems to
      possibly have fixed it.
      
      Regardless, we really should trust the register state more than the
      error code, since it's really the register state that determines whether
      we can actually send a signal, or whether we're in kernel mode and need
      to oops/kill the process in the case of a page fault.
      
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dbe3ed1c
  3. 23 7月, 2007 2 次提交
    • G
      x86_64: Use read and write crX in .c files · f51c9452
      Glauber de Oliveira Costa 提交于
      This patch uses the read and write functions provided at system.h
      for control registers instead of writting raw assembly over and
      over again in .c files. Functions to manipulate cr2 and cr8 were
      provided, as they were lacking.
      
      Also, removed some extra space after closing brackets
      Signed-off-by: NGlauber de Oliveira Costa <gcosta@redhat.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f51c9452
    • M
      x86: i386-show-unhandled-signals-v3 · abd4f750
      Masoud Asgharifard Sharbiani 提交于
      This patch makes the i386 behave the same way that x86_64 does when a
      segfault happens.  A line gets printed to the kernel log so that tools
      that need to check for failures can behave more uniformly between
      debug.show_unhandled_signals sysctl variable to 0 (or by doing echo 0 >
      /proc/sys/debug/exception-trace)
      
      Also, all of the lines being printed are now using printk_ratelimit() to
      deny the ability of DoS from a local user with a program like the
      following:
      
      main()
      {
             while (1)
                     if (!fork()) *(int *)0 = 0;
      }
      
      This new revision also includes the fix that Andrew did which got rid of
      new sysctl that was added to the system in earlier versions of this.
      Also, 'show-unhandled-signals' sysctl has been renamed back to the old
      'exception-trace' to avoid breakage of people's scripts.
      
      AK: Enabling by default for i386 will be likely controversal, but let's see what happens
      AK: Really folks, before complaining just fix your segfaults
      AK: I bet this will find a lot of silent issues
      Signed-off-by: NMasoud Sharbiani <masouds@google.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      [ Personally, I've found the complaints useful on x86-64, so I'm all for
        this. That said, I wonder if we could do it more prettily..   -Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      abd4f750
  4. 22 7月, 2007 2 次提交
  5. 20 7月, 2007 1 次提交
    • N
      mm: fault feedback #2 · 83c54070
      Nick Piggin 提交于
      This patch completes Linus's wish that the fault return codes be made into
      bit flags, which I agree makes everything nicer.  This requires requires
      all handle_mm_fault callers to be modified (possibly the modifications
      should go further and do things like fault accounting in handle_mm_fault --
      however that would be for another patch).
      
      [akpm@linux-foundation.org: fix alpha build]
      [akpm@linux-foundation.org: fix s390 build]
      [akpm@linux-foundation.org: fix sparc build]
      [akpm@linux-foundation.org: fix sparc64 build]
      [akpm@linux-foundation.org: fix ia64 build]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      Cc: Bryan Wu <bryan.wu@analog.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Greg Ungerer <gerg@uclinux.org>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Richard Curnow <rc@rc0.org.uk>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
      Cc: Chris Zankel <chris@zankel.net>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Acked-by: NHaavard Skinnemoen <hskinnemoen@atmel.com>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NAndi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Still apparently needs some ARM and PPC loving - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83c54070
  6. 08 6月, 2007 1 次提交
    • S
      enable interrupts in user path of page fault. · e5e3c84b
      Steven Rostedt 提交于
      This is a minor fix, but what is currently there is essentially wrong.
      In do_page_fault, if the faulting address from user code happens to be
      in kernel address space (int *p = (int*)-1; p = 0xbed;)  then the
      do_page_fault handler will jump over the local_irq_enable with the
      
        goto bad_area_nosemaphore;
      
      But the first line there sees this is user code and goes through the
      process of sending a signal to send SIGSEGV to the user task. This whole
      time interrupts are disabled and the task can not be preempted by a
      higher priority task.
      
      This patch always enables interrupts in the user path of the
      bad_area_nosemaphore.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e5e3c84b
  7. 09 5月, 2007 2 次提交
  8. 03 5月, 2007 1 次提交
  9. 13 2月, 2007 1 次提交
  10. 12 2月, 2007 1 次提交
  11. 07 12月, 2006 1 次提交
  12. 30 9月, 2006 2 次提交
    • S
      [PATCH] pidspace: is_init() · f400e198
      Sukadev Bhattiprolu 提交于
      This is an updated version of Eric Biederman's is_init() patch.
      (http://lkml.org/lkml/2006/2/6/280).  It applies cleanly to 2.6.18-rc3 and
      replaces a few more instances of ->pid == 1 with is_init().
      
      Further, is_init() checks pid and thus removes dependency on Eric's other
      patches for now.
      
      Eric's original description:
      
      	There are a lot of places in the kernel where we test for init
      	because we give it special properties.  Most  significantly init
      	must not die.  This results in code all over the kernel test
      	->pid == 1.
      
      	Introduce is_init to capture this case.
      
      	With multiple pid spaces for all of the cases affected we are
      	looking for only the first process on the system, not some other
      	process that has pid == 1.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: <lxc-devel@lists.sourceforge.net>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f400e198
    • J
      [PATCH] make PROT_WRITE imply PROT_READ · df67b3da
      Jason Baron 提交于
      Make PROT_WRITE imply PROT_READ for a number of architectures which don't
      support write only in hardware.
      
      While looking at this, I noticed that some architectures which do not
      support write only mappings already take the exact same approach.  For
      example, in arch/alpha/mm/fault.c:
      
      "
              if (cause < 0) {
                      if (!(vma->vm_flags & VM_EXEC))
                              goto bad_area;
              } else if (!cause) {
                      /* Allow reads even for write-only mappings */
                      if (!(vma->vm_flags & (VM_READ | VM_WRITE)))
                              goto bad_area;
              } else {
                      if (!(vma->vm_flags & VM_WRITE))
                              goto bad_area;
              }
      "
      
      Thus, this patch brings other architectures which do not support write only
      mappings in-line and consistent with the rest.  I've verified the patch on
      ia64, x86_64 and x86.
      
      Additional discussion:
      
      Several architectures, including x86, can not support write-only mappings.
      The pte for x86 reserves a single bit for protection and its two states are
      read only or read/write.  Thus, write only is not supported in h/w.
      
      Currently, if i 'mmap' a page write-only, the first read attempt on that page
      creates a page fault and will SEGV.  That check is enforced in
      arch/blah/mm/fault.c.  However, if i first write that page it will fault in
      and the pte will be set to read/write.  Thus, any subsequent reads to the page
      will succeed.  It is this inconsistency in behavior that this patch is
      attempting to address.  Furthermore, if the page is swapped out, and then
      brought back the first read will also cause a SEGV.  Thus, any arbitrary read
      on a page can potentially result in a SEGV.
      
      According to the SuSv3 spec, "if the application requests only PROT_WRITE, the
      implementation may also allow read access." Also as mentioned, some
      archtectures, such as alpha, shown above already take the approach that i am
      suggesting.
      
      The counter-argument to this raised by Arjan, is that the kernel is enforcing
      the write only mapping the best it can given the h/w limitations.  This is
      true, however Alan Cox, and myself would argue that the inconsitency in
      behavior, that is applications can sometimes work/sometimes fails is highly
      undesireable.  If you read through the thread, i think people, came to an
      agreement on the last patch i posted, as nobody has objected to it...
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NAndi Kleen <ak@muc.de>
      Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Acked-by: NPaul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Ian Molton <spyro@f2s.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      df67b3da
  13. 26 9月, 2006 3 次提交
    • D
      [PATCH] Standardize pxx_page macros · 46a82b2d
      Dave McCracken 提交于
      One of the changes necessary for shared page tables is to standardize the
      pxx_page macros.  pte_page and pmd_page have always returned the struct
      page associated with their entry, while pte_page_kernel and pmd_page_kernel
      have returned the kernel virtual address.  pud_page and pgd_page, on the
      other hand, return the kernel virtual address.
      
      Shared page tables needs pud_page and pgd_page to return the actual page
      structures.  There are very few actual users of these functions, so it is
      simple to standardize their usage.
      
      Since this is basic cleanup, I am submitting these changes as a standalone
      patch.  Per Hugh Dickins' comments about it, I am also changing the
      pxx_page_kernel macros to pxx_page_vaddr to clarify their meaning.
      Signed-off-by: NDave McCracken <dmccr@us.ibm.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      46a82b2d
    • A
      [PATCH] make fault notifier unconditional and export it · 273819a2
      Andi Kleen 提交于
      It's needed for external debuggers and overhead is very small.
      
      Also make the actual notifier chain they use static
      
      Cc: jbeulich@novell.com
      Signed-off-by: NAndi Kleen <ak@suse.de>
      273819a2
    • A
      [PATCH] Add sparse annotations to quiet sparse in arch/x86_64/mm/fault.c · dd2994f6
      Andi Kleen 提交于
      Fixes
      
      linux/arch/x86_64/mm/fault.c:125:7: warning: incorrect type in argument 1 (different address spaces)
      linux/arch/x86_64/mm/fault.c:125:7:    expected void [noderef] *<noident><asn:1>
      linux/arch/x86_64/mm/fault.c:125:7:    got unsigned char *[assigned] instr
      linux/arch/x86_64/mm/fault.c:163:8: warning: incorrect type in argument 1 (different address spaces)
      linux/arch/x86_64/mm/fault.c:163:8:    expected void [noderef] *<noident><asn:1>
      linux/arch/x86_64/mm/fault.c:163:8:    got unsigned char *[assigned] instr
      linux/arch/x86_64/mm/fault.c:179:9: warning: incorrect type in argument 1 (different address spaces)
      linux/arch/x86_64/mm/fault.c:179:9:    expected void [noderef] *<noident><asn:1>
      linux/arch/x86_64/mm/fault.c:179:9:    got unsigned long *<noident>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      dd2994f6
  14. 04 7月, 2006 1 次提交
  15. 01 7月, 2006 2 次提交
  16. 27 6月, 2006 3 次提交
    • C
      [PATCH] x86_64: enlarge window for stack growth · 03fdc2c2
      Chuck Ebbert 提交于
      Allow stack growth so the 'enter' instruction works.  Also
      fixes problem in compat_sys_kexec_load() which could allocate
      more than 128 bytes using compat_alloc_user_space().
      Signed-off-by: NChuck Ebbert <76306.1226@compuserve.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      03fdc2c2
    • A
      [PATCH] x86_64: Get rid of pud_offset_k / __pud_offset_k · d2ae5b5f
      Andi Kleen 提交于
      pud_offset_k() equivalent to pud_offset() now.  Pointed out by Jan Beulich
      Similar for __pud_offset_ok, which needs a small change in the callers.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d2ae5b5f
    • A
      [PATCH] Notify page fault call chain for x86_64 · 1bd858a5
      Anil S Keshavamurthy 提交于
      Currently in the do_page_fault() code path, we call notify_die(DIE_PAGE_FAULT,
      ...) to notify the page fault.  Since notify_die() is highly overloaded, this
      page fault notification is currently being sent to all the components
      registered with register_die_notification() which uses the same die_chain to
      loop for all the registered components which is unnecessary.
      
      In order to optimize the do_page_fault() code path, this critical page fault
      notification is now moved to different call chain and the test results showed
      great improvements.
      
      And the kprobes which is interested in this notifications, now registers onto
      this new call chain only when it need to, i.e Kprobes now registers for page
      fault notification only when their are an active probes and unregisters from
      this page fault notification when no probes are active.
      
      I have incorporated all the feedback given by Ananth and Keith and everyone,
      and thanks for all the review feedback.
      
      This patch:
      
      Overloading of page fault notification with the notify_die() has performance
      issues(since the only interested components for page fault is kprobes and/or
      kdb) and hence this patch introduces the new notifier call chain exclusively
      for page fault notifications their by avoiding notifying unnecessary
      components in the do_page_fault() code path.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1bd858a5
  17. 01 4月, 2006 1 次提交
    • O
      [PATCH] Don't pass boot parameters to argv_init[] · 9b41046c
      OGAWA Hirofumi 提交于
      The boot cmdline is parsed in parse_early_param() and
      parse_args(,unknown_bootoption).
      
      And __setup() is used in obsolete_checksetup().
      
      	start_kernel()
      		-> parse_args()
      			-> unknown_bootoption()
      				-> obsolete_checksetup()
      
      If __setup()'s callback (->setup_func()) returns 1 in
      obsolete_checksetup(), obsolete_checksetup() thinks a parameter was
      handled.
      
      If ->setup_func() returns 0, obsolete_checksetup() tries other
      ->setup_func().  If all ->setup_func() that matched a parameter returns 0,
      a parameter is seted to argv_init[].
      
      Then, when runing /sbin/init or init=app, argv_init[] is passed to the app.
      If the app doesn't ignore those arguments, it will warning and exit.
      
      This patch fixes a wrong usage of it, however fixes obvious one only.
      Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9b41046c
  18. 26 3月, 2006 2 次提交
    • A
      [PATCH] x86_64: prefetch the mmap_sem in the fault path · a9ba9a3b
      Arjan van de Ven 提交于
      In a micro-benchmark that stresses the pagefault path, the down_read_trylock
      on the mmap_sem showed up quite high on the profile. Turns out this lock is
      bouncing between cpus quite a bit and thus is cache-cold a lot. This patch
      prefetches the lock (for write) as early as possible (and before some other
      somewhat expensive operations). With this patch, the down_read_trylock
      basically fell out of the top of profile.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a9ba9a3b
    • J
      [PATCH] x86_64: actively synchronize vmalloc area when registering certain callbacks · 8c914cb7
      Jan Beulich 提交于
      While the modular aspect of the respective i386 patch doesn't apply to
      x86-64 (as the top level page directory entry is shared between modules
      and the base kernel), handlers registered with register_die_notifier()
      are still under similar constraints for touching ioremap()ed or
      vmalloc()ed memory. The likelihood of this problem becoming visible is
      of course significantly lower, as the assigned virtual addresses would
      have to cross a 2**39 byte boundary. This is because the callback gets
      invoked
      (a) in the page fault path before the top level page table propagation
      gets carried out (hence a fault to propagate the top level page table
      entry/entries mapping to module's code/data would nest infinitly) and
      (b) in the NMI path, where nested faults must absolutely not happen,
      since otherwise the IRET from the nested fault re-enables NMIs,
      potentially resulting in nested NMI occurences.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8c914cb7
  19. 05 2月, 2006 1 次提交
  20. 12 1月, 2006 4 次提交
  21. 15 11月, 2005 1 次提交
    • A
      [PATCH] x86_64: Remove CONFIG_CHECKING and add command line option for pagefault tracing · 9e43e1b7
      Andi Kleen 提交于
      CONFIG_CHECKING covered some debugging code used in the early times
      of the port. But it wasn't even SMP safe for quite some time
      and the bugs it checked for seem to be gone.
      
      This patch removes all the code to verify GS at kernel entry. There
      haven't been any new bugs in this area for a long time.
      
      Previously it also covered the sysctl for the page fault tracing.
      That didn't make much sense because that code was unconditionally
      compiled in. I made that a boot option now because it is typically
      only useful at boot.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9e43e1b7
  22. 13 9月, 2005 1 次提交
  23. 08 9月, 2005 1 次提交
  24. 20 8月, 2005 1 次提交
  25. 04 8月, 2005 1 次提交
  26. 29 7月, 2005 1 次提交
  27. 24 6月, 2005 1 次提交