1. 10 3月, 2011 2 次提交
    • A
      x86/mm: Fix pgd_lock deadlock · a79e53d8
      Andrea Arcangeli 提交于
      It's forbidden to take the page_table_lock with the irq disabled
      or if there's contention the IPIs (for tlb flushes) sent with
      the page_table_lock held will never run leading to a deadlock.
      
      Nobody takes the pgd_lock from irq context so the _irqsave can be
      removed.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: <stable@kernel.org>
      LKML-Reference: <201102162345.p1GNjMjm021738@imap1.linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a79e53d8
    • A
      x86/mm: Handle mm_fault_error() in kernel space · f8626854
      Andrey Vagin 提交于
      mm_fault_error() should not execute oom-killer, if page fault
      occurs in kernel space.  E.g. in copy_from_user()/copy_to_user().
      
      This would happen if we find ourselves in OOM on a
      copy_to_user(), or a copy_from_user() which faults.
      
      Without this patch, the kernels hangs up in copy_from_user(),
      because OOM killer sends SIG_KILL to current process, but it
      can't handle a signal while in syscall, then the kernel returns
      to copy_from_user(), reexcute current command and provokes
      page_fault again.
      
      With this patch the kernel return -EFAULT from copy_from_user().
      
      The code, which checks that page fault occurred in kernel space,
      has been copied from do_sigbus().
      
      This situation is handled by the same way on powerpc, xtensa,
      tile, ...
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: <stable@kernel.org>
      LKML-Reference: <201103092322.p29NMNPH001682@imap1.linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f8626854
  2. 09 3月, 2011 2 次提交
  3. 04 3月, 2011 1 次提交
    • Y
      x86, numa: Fix numa_emulation code with memory-less node0 · 3b28cf32
      Yinghai Lu 提交于
      This crash happens on a system that does not have RAM on node0.
      
      When numa_emulation is compiled in, and:
      
       1. we boot the system without numa=fake...
       2. or we boot the system with numa=fake=128 to make emulation fail
      
      we will get:
      
      [    0.076025] ------------[ cut here ]------------
      [    0.080004] kernel BUG at arch/x86/mm/numa_64.c:788!
      [    0.080004] invalid opcode: 0000 [#1] SMP
      [...]
      
      need to use early_cpu_to_node() directly, because cpu_to_apicid
      and apicid_to_node will return node0 that is not onlined.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      LKML-Reference: <4D6ECF72.5010308@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3b28cf32
  4. 02 3月, 2011 1 次提交
  5. 28 2月, 2011 1 次提交
    • D
      x86: Use u32 instead of long to set reset vector back to 0 · 299c5696
      Don Zickus 提交于
      A customer of ours, complained that when setting the reset
      vector back to 0, it trashed other data and hung their box.
      They noticed when only 4 bytes were set to 0 instead of 8,
      everything worked correctly.
      
      Mathew pointed out:
      
       |
       | We're supposed to be resetting trampoline_phys_low and
       | trampoline_phys_high here, which are two 16-bit values.
       | Writing 64 bits is definitely going to overwrite space
       | that we're not supposed to be touching.
       |
      
      So limit the area modified to u32.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Acked-by: NMatthew Garrett <mjg@redhat.com>
      Cc: <stable@kernel.org>
      LKML-Reference: <1297139100-424-1-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      299c5696
  6. 25 2月, 2011 1 次提交
  7. 24 2月, 2011 1 次提交
    • J
      x86/mrst: Fix apb timer rating when lapic timer is used · 7b62dbec
      Jacob Pan 提交于
      Need to adjust the clockevent device rating for the structure
      that will be registered with clockevent system instead of the
      temporary structure.
      
      Without this fix, APB timer rating will be higher than LAPIC
      timer such that it can not be released later to be used as the
      broadcast timer.
      Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <john.stultz@linaro.org>
      LKML-Reference: <1298506046-439-1-git-send-email-jacob.jun.pan@linux.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7b62dbec
  8. 22 2月, 2011 1 次提交
  9. 21 2月, 2011 1 次提交
  10. 16 2月, 2011 1 次提交
  11. 15 2月, 2011 2 次提交
    • N
      x86, dmi, debug: Log board name (when present) in dmesg/oops output · 84e383b3
      Naga Chumbalkar 提交于
      The "Type 2" SMBIOS record that contains Board Name is not
      strictly required and may be absent in the SMBIOS on some
      platforms.
      
      ( Please note that Type 2 is not listed in Table 3 in Sec 6.2
        ("Required Structures and Data") of the SMBIOS v2.7
        Specification. )
      
      Use the Manufacturer Name (aka System Vendor) name.
      Print Board Name only when it is present.
      
      Before the fix:
        (i) dmesg output: DMI: /ProLiant DL380 G6, BIOS P62 01/29/2011
       (ii) oops output:  Pid: 2170, comm: bash Not tainted 2.6.38-rc4+ #3 /ProLiant DL380 G6
      
      After the fix:
        (i) dmesg output: DMI: HP ProLiant DL380 G6, BIOS P62 01/29/2011
       (ii) oops output:  Pid: 2278, comm: bash Not tainted 2.6.38-rc4+ #4 HP ProLiant DL380 G6
      Signed-off-by: NNaga Chumbalkar <nagananda.chumbalkar@hp.com>
      Reviewed-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
      Cc: <stable@kernel.org> # .3x - good for debugging, please apply as far back as it applies cleanly
      LKML-Reference: <20110214224423.2182.13929.sendpatchset@nchumbalkar.americas.hpqcorp.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      84e383b3
    • P
      x86, ioapic: Don't warn about non-existing IOAPICs if we have none · 678301ec
      Paul Bolle 提交于
      mp_find_ioapic() prints errors like:
      
          ERROR: Unable to locate IOAPIC for GSI 13
      
      if it can't find the IOAPIC that manages that specific GSI. I
      see errors like that at every boot of a laptop that apparently
      doesn't have any IOAPICs.
      
      But if there are no IOAPICs it doesn't seem to be an error that
      none can be found. A solution that gets rid of this message is
      to directly return if nr_ioapics (still) is zero. (But keep
      returning -1 in that case, so nothing breaks from this change.)
      
      The call chain that generates this error is:
      
      pnpacpi_allocated_resource()
          case ACPI_RESOURCE_TYPE_IRQ:
              pnpacpi_parse_allocated_irqresource()
                  acpi_get_override_irq()
                       mp_find_ioapic()
      Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      678301ec
  12. 14 2月, 2011 1 次提交
  13. 12 2月, 2011 2 次提交
    • T
      x86: Readd missing irq_to_desc() in fixup_irq() · 5117348d
      Thomas Gleixner 提交于
      commit a3c08e5d(x86: Convert irq_chip access to new functions)
      accidentally zapped desc = irq_to_desc(irq); in the vector loop.
      So we lock some random irq descriptor.
      
      Add it back.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: <stable@kernel.org> # .37
      5117348d
    • P
      x86: Fix text_poke_smp_batch() deadlock · d91309f6
      Peter Zijlstra 提交于
      Fix this deadlock - we are already holding the mutex:
      
      =======================================================
      [ INFO: possible circular locking dependency detected ] 2.6.38-rc4-test+ #1
      -------------------------------------------------------
      bash/1850 is trying to acquire lock:
       (text_mutex){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
      
      but task is already holding lock:
       (smp_alt){+.+...}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #2 (smp_alt){+.+...}:
             [<ffffffff81082d02>] lock_acquire+0xcd/0xf8
             [<ffffffff8192e119>] __mutex_lock_common+0x4c/0x339
             [<ffffffff8192e4ca>] mutex_lock_nested+0x3e/0x43
             [<ffffffff8101050f>] alternatives_smp_switch+0x77/0x1d8
             [<ffffffff81926a6f>] do_boot_cpu+0xd7/0x762
             [<ffffffff819277dd>] native_cpu_up+0xe6/0x16a
             [<ffffffff81928e28>] _cpu_up+0x9d/0xee
             [<ffffffff81928f4c>] cpu_up+0xd3/0xe7
             [<ffffffff82268d4b>] kernel_init+0xe8/0x20a
             [<ffffffff8100ba24>] kernel_thread_helper+0x4/0x10
      
      -> #1 (cpu_hotplug.lock){+.+.+.}:
             [<ffffffff81082d02>] lock_acquire+0xcd/0xf8
             [<ffffffff8192e119>] __mutex_lock_common+0x4c/0x339
             [<ffffffff8192e4ca>] mutex_lock_nested+0x3e/0x43
             [<ffffffff810568cc>] get_online_cpus+0x41/0x55
             [<ffffffff810a1348>] stop_machine+0x1e/0x3e
             [<ffffffff819314c1>] text_poke_smp_batch+0x3a/0x3c
             [<ffffffff81932b6c>] arch_optimize_kprobes+0x10d/0x11c
             [<ffffffff81933a51>] kprobe_optimizer+0x152/0x222
             [<ffffffff8106bb71>] process_one_work+0x1d3/0x335
             [<ffffffff8106cfae>] worker_thread+0x104/0x1a4
             [<ffffffff810707c4>] kthread+0x9d/0xa5
             [<ffffffff8100ba24>] kernel_thread_helper+0x4/0x10
      
      -> #0 (text_mutex){+.+.+.}:
      
      other info that might help us debug this:
      
      6 locks held by bash/1850:
       #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
       #1:  (s_active#75){.+.+.+}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
       #2:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
       #3:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
       #4:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
       #5:  (smp_alt){+.+...}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
      
      stack backtrace:
      Pid: 1850, comm: bash Not tainted 2.6.38-rc4-test+ #1
      Call Trace:
      
       [<ffffffff81080eb2>] print_circular_bug+0xa8/0xb7
       [<ffffffff8192e4ca>] mutex_lock_nested+0x3e/0x43
       [<ffffffff81010302>] alternatives_smp_unlock+0x3d/0x93
       [<ffffffff81010630>] alternatives_smp_switch+0x198/0x1d8
       [<ffffffff8102568a>] native_cpu_die+0x65/0x95
       [<ffffffff818cc4ec>] _cpu_down+0x13e/0x202
       [<ffffffff8117a619>] sysfs_write_file+0x108/0x144
       [<ffffffff8111f5a2>] vfs_write+0xac/0xff
       [<ffffffff8111f7a9>] sys_write+0x4a/0x6e
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Tested-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: mathieu.desnoyers@efficios.com
      Cc: rusty@rustcorp.com.au
      Cc: ananth@in.ibm.com
      Cc: masami.hiramatsu.pt@hitachi.com
      Cc: fweisbec@gmail.com
      Cc: jbeulich@novell.com
      Cc: jbaron@redhat.com
      Cc: mhiramat@redhat.com
      LKML-Reference: <1297458466.5226.93.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d91309f6
  14. 10 2月, 2011 2 次提交
    • J
      x86: Fix section mismatch in LAPIC initialization · 2fb270f3
      Jan Beulich 提交于
      Additionally doing things conditionally upon smp_processor_id()
      being zero is generally a bad idea, as this means CPU 0 cannot
      be offlined and brought back online later again.
      
      While there may be other places where this is done, I think adding
      more of those should be avoided so that some day SMP can really
      become "symmetrical".
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      LKML-Reference: <4D525C7E0200007800030EE1@vpn.id2.novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2fb270f3
    • J
      KVM: SVM: Make sure KERNEL_GS_BASE is valid when loading gs_index · 893a5ab6
      Joerg Roedel 提交于
      The gs_index loading code uses the swapgs instruction to
      switch to the user gs_base temporarily. This is unsave in an
      lightweight exit-path in KVM on AMD because the
      KERNEL_GS_BASE MSR is switches lazily. An NMI happening in
      the critical path of load_gs_index may use the wrong GS_BASE
      value then leading to unpredictable behavior, e.g. a
      triple-fault.
      
      This patch fixes the issue by making sure that load_gs_index
      is called only with a valid KERNEL_GS_BASE value loaded in
      KVM.
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      893a5ab6
  15. 07 2月, 2011 1 次提交
  16. 05 2月, 2011 1 次提交
  17. 04 2月, 2011 1 次提交
    • S
      x86, mm: avoid possible bogus tlb entries by clearing prev mm_cpumask after switching mm · 831d52bc
      Suresh Siddha 提交于
      Clearing the cpu in prev's mm_cpumask early will avoid the flush tlb
      IPI's while the cr3 is still pointing to the prev mm.  And this window
      can lead to the possibility of bogus TLB fills resulting in strange
      failures.  One such problematic scenario is mentioned below.
      
       T1. CPU-1 is context switching from mm1 to mm2 context and got a NMI
           etc between the point of clearing the cpu from the mm_cpumask(mm1)
           and before reloading the cr3 with the new mm2.
      
       T2. CPU-2 is tearing down a specific vma for mm1 and will proceed with
           flushing the TLB for mm1.  It doesn't send the flush TLB to CPU-1
           as it doesn't see that cpu listed in the mm_cpumask(mm1).
      
       T3. After the TLB flush is complete, CPU-2 goes ahead and frees the
           page-table pages associated with the removed vma mapping.
      
       T4. CPU-2 now allocates those freed page-table pages for something
           else.
      
       T5. As the CR3 and TLB caches for mm1 is still active on CPU-1, CPU-1
           can potentially speculate and walk through the page-table caches
           and can insert new TLB entries.  As the page-table pages are
           already freed and being used on CPU-2, this page walk can
           potentially insert a bogus global TLB entry depending on the
           (random) contents of the page that is being used on CPU-2.
      
       T6. This bogus TLB entry being global will be active across future CR3
           changes and can result in weird memory corruption etc.
      
      To avoid this issue, for the prev mm that is handing over the cpu to
      another mm, clear the cpu from the mm_cpumask(prev) after the cr3 is
      changed.
      
      Marking it for -stable, though we haven't seen any reported failure that
      can be attributed to this.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: stable@kernel.org	[v2.6.32+]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      831d52bc
  18. 03 2月, 2011 2 次提交
    • S
      x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platforms · f7448548
      Suresh Siddha 提交于
      Markus Kohn ran into a hard hang regression on an acer aspire
      1310, when acpi is enabled. git bisect showed the following
      commit as the bad one that introduced the boot regression.
      
      	commit d0af9eed
      	Author: Suresh Siddha <suresh.b.siddha@intel.com>
      	Date:   Wed Aug 19 18:05:36 2009 -0700
      
      	    x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init
      
      Because of the UP configuration of that platform,
      native_smp_prepare_cpus() bailed out (in smp_sanity_check())
      before doing the set_mtrr_aps_delayed_init()
      
      Further down the boot path, native_smp_cpus_done() will call the
      delayed MTRR initialization for the AP's (mtrr_aps_init()) with
      mtrr_aps_delayed_init not set. This resulted in the boot
      processor reprogramming its MTRR's to the values seen during the
      start of the OS boot. While this is not needed ideally, this
      shouldn't have caused any side-effects. This is because the
      reprogramming of MTRR's (set_mtrr_state() that gets called via
      set_mtrr()) will check if the live register contents are
      different from what is being asked to write and will do the actual
      write only if they are different.
      
      BP's mtrr state is read during the start of the OS boot and
      typically nothing would have changed when we ask to reprogram it
      on BP again because of the above scenario on an UP platform. So
      on a normal UP platform no reprogramming of BP MTRR MSR's
      happens and all is well.
      
      However, on this platform, bios seems to be modifying the fixed
      mtrr range registers between the start of OS boot and when we
      double check the live registers for reprogramming BP MTRR
      registers. And as the live registers are modified, we end up
      reprogramming the MTRR's to the state seen during the start of
      the OS boot.
      
      During ACPI initialization, something in the bios (probably smi
      handler?) don't like this fact and results in a hard lockup.
      
      We didn't see this boot hang issue on this platform before the
      commit d0af9eed, because only
      the AP's (if any) will program its MTRR's to the value that BP
      had at the start of the OS boot.
      
      Fix this issue by checking mtrr_aps_delayed_init before
      continuing further in the mtrr_aps_init(). Now, only AP's (if
      any) will program its MTRR's to the BP values during boot.
      
      Addresses https://bugzilla.novell.com/show_bug.cgi?id=623393
      
        [ By the way, this behavior of the bios modifying MTRR's after the start
          of the OS boot is not common and the kernel is not prepared to
          handle this situation well. Irrespective of this issue, during
          suspend/resume, linux kernel will try to reprogram the BP's MTRR values
          to the values seen during the start of the OS boot. So suspend/resume might
          be already broken on this platform for all linux kernel versions. ]
      Reported-and-bisected-by: NMarkus Kohn <jabber@gmx.org>
      Tested-by: NMarkus Kohn <jabber@gmx.org>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Thomas Renninger <trenn@novell.com>
      Cc: Rafael Wysocki <rjw@novell.com>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: stable@kernel.org # [v2.6.32+]
      LKML-Reference: <1296694975.4418.402.camel@sbsiddha-MOBL3.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f7448548
    • M
      x86, nx: Don't force pages RW when setting NX bits · f12d3d04
      Matthieu CASTET 提交于
      Xen want page table pages read only.
      
      But the initial page table (from head_*.S) live in .data or .bss.
      
      That was broken by 64edc8ed.  There is
      absolutely no reason to force these pages RW after they have already
      been marked RO.
      Signed-off-by: NMatthieu CASTET <castet.matthieu@free.fr>
      Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      f12d3d04
  19. 28 1月, 2011 2 次提交
  20. 27 1月, 2011 2 次提交
  21. 26 1月, 2011 3 次提交
    • E
      percpu, x86: Fix percpu_xchg_op() · 889a7a6a
      Eric Dumazet 提交于
      These recent percpu commits:
      
        2485b646: x86,percpu: Move out of place 64 bit ops into X86_64 section
        8270137a: cpuops: Use cmpxchg for xchg to avoid lock semantics
      
      Caused this 'perf top' crash:
      
       Kernel panic - not syncing: Fatal exception in interrupt
       Pid: 0, comm: swapper Tainted: G     D
       2.6.38-rc2-00181-gef71723 #413 Call Trace: <IRQ> [<ffffffff810465b5>]
          ? panic
          ? kmsg_dump
          ? kmsg_dump
          ? oops_end
          ? no_context
          ? __bad_area_nosemaphore
          ? perf_output_begin
          ? bad_area_nosemaphore
          ? do_page_fault
          ? __task_pid_nr_ns
          ? perf_event_tid
          ? __perf_event_header__init_id
          ? validate_chain
          ? perf_output_sample
          ? trace_hardirqs_off
          ? page_fault
          ? irq_work_run
          ? update_process_times
          ? tick_sched_timer
          ? tick_sched_timer
          ? __run_hrtimer
          ? hrtimer_interrupt
          ? account_system_vtime
          ? smp_apic_timer_interrupt
          ? apic_timer_interrupt
       ...
      
      Looking at assembly code, I found:
      
      list = this_cpu_xchg(irq_work_list, NULL);
      
      gives this wrong code : (gcc-4.1.2 cross compiler)
      
      ffffffff810bc45e:
      	mov    %gs:0xead0,%rax
      	cmpxchg %rax,%gs:0xead0
      	jne    ffffffff810bc45e <irq_work_run+0x3e>
      	test   %rax,%rax
      	je     ffffffff810bc4aa <irq_work_run+0x8a>
      
      Tell gcc we dirty eax/rax register in percpu_xchg_op()
      
      Compiler must use another register to store pxo_new__
      
      We also dont need to reload percpu value after a jump,
      since a 'failed' cmpxchg already updated eax/rax
      
      Wrong generated code was :
      	xor     %rax,%rax   /* load 0 into %rax */
      1:	mov     %gs:0xead0,%rax
      	cmpxchg %rax,%gs:0xead0
      	jne     1b
      	test    %rax,%rax
      
      After patch :
      
      	xor     %rdx,%rdx   /* load 0 into %rdx */
      	mov     %gs:0xead0,%rax
      1:	cmpxchg %rdx,%gs:0xead0
      	jne     1b:
      	test    %rax,%rax
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Tejun Heo <tj@kernel.org>
      LKML-Reference: <1295973114.3588.312.camel@edumazet-laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      889a7a6a
    • Y
      x86: Remove left over system_64.h · 9a57c3e4
      Yinghai Lu 提交于
      Left-over from the x86 merge ...
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4D3E23D1.7010405@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9a57c3e4
    • A
      thp: fix PARAVIRT x86 32bit noPAE · cacf061c
      Andrea Arcangeli 提交于
      This fixes TRANSPARENT_HUGEPAGE=y with PARAVIRT=y and HIGHMEM64=n.
      
      The #ifdef that this patch removes was erratically introduced to fix a
      build error for noPAE (where pmd.pmd doesn't exist).  So then the kernel
      built but it failed at runtime because set_pmd_at was a noop.  This will
      correct it by enabling set_pmd_at for noPAE mode too.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: Nwerner <w.landgraf@ru.ru>
      Reported-by: NMinchan Kim <minchan.kim@gmail.com>
      Tested-by: NMinchan Kim <minchan.kim@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cacf061c
  22. 25 1月, 2011 1 次提交
    • J
      x86-64: Don't use pointer to out-of-scope variable in dump_trace() · 2e5aa682
      Jesper Juhl 提交于
      In arch/x86/kernel/dumpstack_64.c::dump_trace() we have this code:
      
      ...
        		if (!stack) {
        			unsigned long dummy;
        			stack = &dummy;
        			if (task && task != current)
        				stack = (unsigned long *)task->thread.sp;
        		}
      
        		bp = stack_frame(task, regs);
        		/*
        		 * Print function call entries in all stacks, starting at the
        		 * current stack address. If the stacks consist of nested
        		 * exceptions
        		 */
        		tinfo = task_thread_info(task);
      
        		for (;;) {
        			char *id;
        			unsigned long *estack_end;
        			estack_end = in_exception_stack(cpu, (unsigned long)stack,
        							&used, &id);
      ...
      
      You'll notice that we assign to 'stack' the address of the variable
      'dummy' which is only in-scope inside the 'if (!stack)'. So when we later
      access stack (at the end of the above, and assuming we did not take the
      'if (task && task != current)' branch) we'll be using the address of a
      variable that is no longer in scope. I believe this patch is the proper
      fix, but I freely admit that I'm not 100% certain.
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      LKML-Reference: <alpine.LNX.2.00.1101242232590.10252@swampdragon.chaosbits.net>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      2e5aa682
  23. 23 1月, 2011 1 次提交
    • M
      x86: Fix jump label with RO/NX module protection crash · 89696913
      matthieu castet 提交于
      If we use jump table in module init, there are marked
      as removed in __jump_table section after init is done.
      
      But we already applied ro permissions on the module, so
      we can't modify a read only section (crash in
      remove_jump_label_module_init).
      
      Make the __jump_table section rw.
      Signed-off-by: NMatthieu CASTET <castet.matthieu@free.fr>
      Cc: Xiaotian Feng <xtfeng@gmail.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Siarhei Liakh <sliakh.lkml@gmail.com>
      Cc: Xuxian Jiang <jiang@cs.ncsu.edu>
      Cc: James Morris <jmorris@namei.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Kees Cook <kees.cook@canonical.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4D3C3F20.7030203@free.fr>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      89696913
  24. 22 1月, 2011 2 次提交
    • B
      x86, hotplug: Fix powersavings with offlined cores on AMD · 93789b32
      Borislav Petkov 提交于
      ea530692 made a CPU use monitor/mwait
      when offline. This is not the optimal choice for AMD wrt to powersavings
      and we'd prefer our cores to halt (i.e. enter C1) instead. For this, the
      same selection whether to use monitor/mwait has to be used as when we
      select the idle routine for the machine.
      
      With this patch, offlining cores 1-5 on a X6 machine allows core0 to
      boost again.
      
      [ hpa: putting this in urgent since it is a (power) regression fix ]
      Reported-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      Cc: stable@kernel.org # 37.x
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.hl>
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      LKML-Reference: <1295534572-10730-1-git-send-email-bp@amd64.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      93789b32
    • S
      xen: p2m: correctly initialize partial p2m leaf · 8e1b4cf2
      Stefan Bader 提交于
      After changing the p2m mapping to a tree by
      
        commit 58e05027
          xen: convert p2m to a 3 level tree
      
      and trying to boot a DomU with 615MB of memory, the following crash was
      observed in the dump:
      
      kernel direct mapping tables up to 26f00000 @ 1ec4000-1fff000
      BUG: unable to handle kernel NULL pointer dereference at (null)
      IP: [<c0107397>] xen_set_pte+0x27/0x60
      *pdpt = 0000000000000000 *pde = 0000000000000000
      
      Adding further debug statements showed that when trying to set up
      pfn=0x26700 the returned mapping was invalid.
      
      pfn=0x266ff calling set_pte(0xc1fe77f8, 0x6b3003)
      pfn=0x26700 calling set_pte(0xc1fe7800, 0x3)
      
      Although the last_pfn obtained from the startup info is 0x26700, which
      should in turn not be hit, the additional 8MB which are added as extra
      memory normally seem to be ok. This lead to looking into the initial
      p2m tree construction, which uses the smaller value and assuming that
      there is other code handling the extra memory.
      
      When the p2m tree is set up, the leaves are directly pointed to the
      array which the domain builder set up. But if the mapping is not on a
      boundary that fits into one p2m page, this will result in the last leaf
      being only partially valid. And as the invalid entries are not
      initialized in that case, things go badly wrong.
      
      I am trying to fix that by checking whether the current leaf is a
      complete map and if not, allocate a completely new page and copy only
      the valid pointers there. This may not be the most efficient or elegant
      solution, but at least it seems to allow me booting DomUs with memory
      assignments all over the range.
      
      BugLink: http://bugs.launchpad.net/bugs/686692
      [v2: Redid a bit of commit wording and fixed a compile warning]
      Signed-off-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      8e1b4cf2
  25. 21 1月, 2011 4 次提交
  26. 20 1月, 2011 1 次提交
    • T
      lockdep: Move early boot local IRQ enable/disable status to init/main.c · 2ce802f6
      Tejun Heo 提交于
      During early boot, local IRQ is disabled until IRQ subsystem is
      properly initialized.  During this time, no one should enable
      local IRQ and some operations which usually are not allowed with
      IRQ disabled, e.g. operations which might sleep or require
      communications with other processors, are allowed.
      
      lockdep tracked this with early_boot_irqs_off/on() callbacks.
      As other subsystems need this information too, move it to
      init/main.c and make it generally available.  While at it,
      toggle the boolean to early_boot_irqs_disabled instead of
      enabled so that it can be initialized with %false and %true
      indicates the exceptional condition.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <20110120110635.GB6036@htj.dyndns.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2ce802f6