1. 13 6月, 2008 1 次提交
    • H
      x86: fix endless page faults in mount_block_root for Linux 2.6 · b29c701d
      Henry Nestler 提交于
      Page faults in kernel address space between PAGE_OFFSET up to
      VMALLOC_START should not try to map as vmalloc.
      
      Fix rarely endless page faults inside mount_block_root for root
      filesystem at boot time.
      
      All 32bit kernels up to 2.6.25 can fail into this hole.
      I can not present this under native linux kernel. I see, that the 64bit
      has fixed the problem. I copied the same lines into 32bit part.
      
      Recorded debugs are from coLinux kernel 2.6.22.18 (virtualisation):
      http://www.henrynestler.com/colinux/testing/pfn-check-0.7.3/20080410-antinx/bug16-recursive-page-fault-endless.txt
      The physicaly memory was trimmed down to 192MB to better catch the bug.
      More memory gets the bug more rarely.
      
      Details, how every x86 32bit system can fail:
      
      Start from "mount_block_root",
      http://lxr.linux.no/linux/init/do_mounts.c#L297
      There the variable "fs_names" got one memory page with 4096 bytes.
      Variable "p" walks through the existing file system types. The first
      string is no problem.
      But, with the second loop in mount_block_root the offset of "p" is not
      at beginning of page, the offset is for example +9, if "reiserfs" is the
      first in list.
      Than calls do_mount_root, and lands in sys_mount.
      Remember: Variable "type_page" contains now "fs_type+9" and not contains
      a full page.
      The sys_mount copies 4096 bytes with function "exact_copy_from_user()":
      http://lxr.linux.no/linux/fs/namespace.c#L1540
      
      Mostly exist pages after the buffer "fs_names+4096+9" and the page fault
      handler was not called. No problem.
      
      In the case, if the page after "fs_names+4096" is not mapped, the page
      fault handler was called from http://lxr.linux.no/linux/fs/namespace.c#L1320
      
      The do_page_fault gots an address 0xc03b4000.
      It's kernel address, address >= TASK_SIZE, but not from vmalloc! It's
      from "__getname()" alias "kmem_cache_alloc".
      The "error_code" is 0. "vmalloc_fault" will be call:
      http://lxr.linux.no/linux/arch/i386/mm/fault.c#L332
      
      "vmalloc_fault" tryed to find the physical page for a non existing
      virtual memory area. The macro "pte_present" in vmalloc_fault()
      got a next page fault for 0xc0000ed0 at:
      http://lxr.linux.no/linux/arch/i386/mm/fault.c#L282
      
      No PTE exist for such virtual address. The page fault handler was trying
      to sync the physical page for the PTE lockup.
      
      This called vmalloc_fault() again for address 0xc000000, and that also
      was not existing. The endless began...
      
      In normal case the cpu would still loop with disabled interrrupts. Under
      coLinux this was catched by a stack overflow inside printk debugs.
      Signed-off-by: NHenry Nestler <henry.nestler@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b29c701d
  2. 17 4月, 2008 2 次提交
  3. 28 3月, 2008 1 次提交
    • I
      x86: prefetch fix #2 · 3085354d
      Ingo Molnar 提交于
      Linus noticed a second bug and an uncleanliness:
      
       - we'd return on any instruction fetch fault
      
       - we'd use both the value of 16 and the PF_INSTR symbol which are
         the same and make no sense
      
      the cleanup nicely unifies this piece of logic.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3085354d
  4. 27 3月, 2008 1 次提交
  5. 15 2月, 2008 1 次提交
  6. 07 2月, 2008 2 次提交
    • I
      x86: fix deadlock, make pgd_lock irq-safe · 58d5d0d8
      Ingo Molnar 提交于
      lockdep just caught this one:
      
      =================================
      [ INFO: inconsistent lock state ]
      2.6.24 #38
      ---------------------------------
      inconsistent {in-softirq-W} -> {softirq-on-W} usage.
      swapper/1 [HC0[0]:SC0[0]:HE1:SE1] takes:
       (pgd_lock){-+..}, at: [<ffffffff8022a9ea>] mm_init+0x1da/0x250
      {in-softirq-W} state was registered at:
        [<ffffffffffffffff>] 0xffffffffffffffff
      irq event stamp: 394559
      hardirqs last  enabled at (394559): [<ffffffff80267f0a>] get_page_from_freelist+0x30a/0x4c0
      hardirqs last disabled at (394558): [<ffffffff80267d25>] get_page_from_freelist+0x125/0x4c0
      softirqs last  enabled at (393952): [<ffffffff80232f8e>] __do_softirq+0xce/0xe0
      softirqs last disabled at (393945): [<ffffffff8020c57c>] call_softirq+0x1c/0x30
      
      other info that might help us debug this:
      no locks held by swapper/1.
      
      stack backtrace:
      Pid: 1, comm: swapper Not tainted 2.6.24 #38
      
      Call Trace:
       [<ffffffff8024e1fb>] print_usage_bug+0x18b/0x190
       [<ffffffff8024f55d>] mark_lock+0x53d/0x560
       [<ffffffff8024fffa>] __lock_acquire+0x3ca/0xed0
       [<ffffffff80250ba8>] lock_acquire+0xa8/0xe0
       [<ffffffff8022a9ea>] ? mm_init+0x1da/0x250
       [<ffffffff809bcd10>] _spin_lock+0x30/0x70
       [<ffffffff8022a9ea>] mm_init+0x1da/0x250
       [<ffffffff8022aa99>] mm_alloc+0x39/0x50
       [<ffffffff8028b95a>] bprm_mm_init+0x2a/0x1a0
       [<ffffffff8028d12b>] do_execve+0x7b/0x220
       [<ffffffff80209776>] sys_execve+0x46/0x70
       [<ffffffff8020c214>] kernel_execve+0x64/0xd0
       [<ffffffff8020901e>] ? _stext+0x1e/0x20
       [<ffffffff802090ba>] init_post+0x9a/0xf0
       [<ffffffff809bc5f6>] ? trace_hardirqs_on_thunk+0x35/0x3a
       [<ffffffff8024f75a>] ? trace_hardirqs_on+0xba/0xd0
       [<ffffffff8020c1a8>] ? child_rip+0xa/0x12
       [<ffffffff8020bcbc>] ? restore_args+0x0/0x44
       [<ffffffff8020c19e>] ? child_rip+0x0/0x12
      
      turns out that pgd_lock has been used on 64-bit x86 in an irq-unsafe
      way for almost two years, since commit 8c914cb7.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      58d5d0d8
    • T
      x86: make spurious fault handler aware of large mappings · d8b57bb7
      Thomas Gleixner 提交于
      In very rare cases, on certain CPUs, we could end up in the spurious
      fault handler and ignore a large pud/pmd mapping. The resulting pte
      pointer points into the mapped physical space and dereferencing it
      will fault recursively.
      
      Make the code aware of large mappings and do the permission check
      on the pmd/pud entry, when a large pud/pmd mapping is detected.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d8b57bb7
  7. 04 2月, 2008 2 次提交
  8. 02 2月, 2008 1 次提交
  9. 30 1月, 2008 29 次提交