1. 22 7月, 2010 2 次提交
  2. 21 7月, 2010 1 次提交
    • Y
      x86, numa: fix boot without RAM on node0 again · 9aebbdb6
      Yinghai Lu 提交于
      Commit e534c7c5 ("numa: x86_64: use generic percpu var
      numa_node_id() implementation") broke numa systems that don't have ram
      on node0 when MEMORY_HOTPLUG is enabled, because cpu_up() will call
      cpu_to_node() before per_cpu(numa_node) is setup for APs.
      
      When Node0 doesn't have RAM, on x86, cpus already round it to nearest
      node with RAM in x86_cpu_to_node_map.  and per_cpu(numa_node) is not set
      up until in c_init for APs.
      
      When later cpu_up() calling cpu_to_node() will get 0 again, and make it
      online even there is no RAM on node0.  so later all APs can not booted up,
      and later will have panic.
      
      [    1.611101] On node 0 totalpages: 0
      .........
      [    2.608558] On node 0 totalpages: 0
      [    2.612065] Brought up 1 CPUs
      [    2.615199] Total of 1 processors activated (3990.31 BogoMIPS).
      ...
         93.225341] calling  loop_init+0x0/0x1a4 @ 1
      [   93.229314] PERCPU: allocation failed, size=80 align=8, failed to populate
      [   93.246539] Pid: 1, comm: swapper Tainted: G        W   2.6.35-rc4-tip-yh-04371-gd64e6c4-dirty #354
      [   93.264621] Call Trace:
      [   93.266533]  [<ffffffff81125e43>] pcpu_alloc+0x83a/0x8e7
      [   93.270710]  [<ffffffff81125f15>] __alloc_percpu+0x10/0x12
      [   93.285849]  [<ffffffff8140786c>] alloc_disk_node+0x94/0x16d
      [   93.291811]  [<ffffffff81407956>] alloc_disk+0x11/0x13
      [   93.306157]  [<ffffffff81503e51>] loop_alloc+0xa7/0x180
      [   93.310538]  [<ffffffff8277ef48>] loop_init+0x9b/0x1a4
      [   93.324909]  [<ffffffff8277eead>] ? loop_init+0x0/0x1a4
      [   93.329650]  [<ffffffff810001f2>] do_one_initcall+0x57/0x136
      [   93.345197]  [<ffffffff827486d0>] kernel_init+0x184/0x20e
      [   93.348146]  [<ffffffff81034954>] kernel_thread_helper+0x4/0x10
      [   93.365194]  [<ffffffff81c7cc3c>] ? restore_args+0x0/0x30
      [   93.369305]  [<ffffffff8274854c>] ? kernel_init+0x0/0x20e
      [   93.386011]  [<ffffffff81034950>] ? kernel_thread_helper+0x0/0x10
      [   93.392047] loop: out of memory
      ...
      
      Try to assign per_cpu(numa_node) early
      
      [akpm@linux-foundation.org: tidy up code comment]
      Signed-off-by: NYinghai <yinghai@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Acked-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9aebbdb6
  3. 19 7月, 2010 2 次提交
  4. 17 7月, 2010 3 次提交
    • J
      x86, pci, mrst: Add extra sanity check in walking the PCI extended cap chain · f82c3d71
      Jacob Pan 提交于
      The fixed bar capability structure is searched in PCI extended
      configuration space.  We need to make sure there is a valid capability
      ID to begin with otherwise, the search code may stuck in a infinite
      loop which results in boot hang.  This patch adds additional check for
      cap ID 0, which is also invalid, and indicates end of chain.
      
      End of chain is supposed to have all fields zero, but that doesn't
      seem to always be the case in the field.
      Suggested-by: N"H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
      Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      LKML-Reference: <1279306706-27087-1-git-send-email-jacob.jun.pan@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      f82c3d71
    • Y
      x86: Fix x2apic preenabled system with kexec · fd19dce7
      Yinghai Lu 提交于
      Found one x2apic system kexec loop test failed
      when CONFIG_NMI_WATCHDOG=y (old) or CONFIG_LOCKUP_DETECTOR=y (current tip)
      
      first kernel can kexec second kernel, but second kernel can not kexec third one.
      
      it can be duplicated on another system with BIOS preenabled x2apic.
      First kernel can not kexec second kernel.
      
      It turns out, when kernel boot with pre-enabled x2apic, it will not execute
      disable_local_APIC on shutdown path.
      
      when init_apic_mappings() is called in setup_arch, it will skip setting of
      apic_phys when x2apic_mode is set. ( x2apic_mode is much early check_x2apic())
      Then later, disable_local_APIC() will bail out early because !apic_phys.
      
      So check !x2apic_mode in x2apic_mode in disable_local_APIC with !apic_phys.
      
      another solution could be updating init_apic_mappings() to set apic_phys even
      for preenabled x2apic system. Actually even for x2apic system, that lapic
      address is mapped already in early stage.
      
      BTW: is there any x2apic preenabled system with apicid of boot cpu > 255?
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4C3EB22B.3000701@kernel.org>
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: stable@kernel.org
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      fd19dce7
    • B
      PCI: fall back to original BIOS BAR addresses · 58c84eda
      Bjorn Helgaas 提交于
      If we fail to assign resources to a PCI BAR, this patch makes us try the
      original address from BIOS rather than leaving it disabled.
      
      Linux tries to make sure all PCI device BARs are inside the upstream
      PCI host bridge or P2P bridge apertures, reassigning BARs if necessary.
      Windows does similar reassignment.
      
      Before this patch, if we could not move a BAR into an aperture, we left
      the resource unassigned, i.e., at address zero.  Windows leaves such BARs
      at the original BIOS addresses, and this patch makes Linux do the same.
      
      This is a bit ugly because we disable the resource long before we try to
      reassign it, so we have to keep track of the BIOS BAR address somewhere.
      For lack of a better place, I put it in the struct pci_dev.
      
      I think it would be cleaner to attempt the assignment immediately when the
      claim fails, so we could easily remember the original address.  But we
      currently claim motherboard resources in the middle, after attempting to
      claim PCI resources and before assigning new PCI resources, and changing
      that is a fairly big job.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16263Reported-by: NAndrew <nitr0@seti.kr.ua>
      Tested-by: NAndrew <nitr0@seti.kr.ua>
      Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      58c84eda
  5. 15 7月, 2010 1 次提交
  6. 13 7月, 2010 1 次提交
  7. 08 7月, 2010 3 次提交
  8. 06 7月, 2010 1 次提交
    • A
      KVM: VMX: Fix host MSR_KERNEL_GS_BASE corruption · da38f438
      Avi Kivity 提交于
      enter_lmode() and exit_lmode() modify the guest's EFER.LMA before calling
      vmx_set_efer().  However, the latter function depends on the value of EFER.LMA
      to determine whether MSR_KERNEL_GS_BASE needs reloading, via
      vmx_load_host_state().  With EFER.LMA changing under its feet, it took the
      wrong choice and corrupted userspace's %gs.
      
      This causes 32-on-64 host userspace to fault.
      
      Fix not touching EFER.LMA; instead ask vmx_set_efer() to change it.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      da38f438
  9. 05 7月, 2010 1 次提交
    • P
      rbtree: Undo augmented trees performance damage and regression · b945d6b2
      Peter Zijlstra 提交于
      Reimplement augmented RB-trees without sprinkling extra branches
      all over the RB-tree code (which lives in the scheduler hot path).
      
      This approach is 'borrowed' from Fabio's BFQ implementation and
      relies on traversing the rebalance path after the RB-tree-op to
      correct the heap property for insertion/removal and make up for
      the damage done by the tree rotations.
      
      For insertion the rebalance path is trivially that from the new
      node upwards to the root, for removal it is that from the deepest
      node in the path from the to be removed node that will still
      be around after the removal.
      
      [ This patch also fixes a video driver regression reported by
        Ali Gholami Rudi - the memtype->subtree_max_end was updated
        incorrectly. ]
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Acked-by: NVenkatesh Pallipadi <venki@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NAli Gholami Rudi <ali@rudi.ir>
      Cc: Fabio Checconi <fabio@gandalf.sssup.it>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <1275414172.27810.27961.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b945d6b2
  10. 03 7月, 2010 1 次提交
  11. 01 7月, 2010 1 次提交
  12. 30 6月, 2010 1 次提交
    • F
      x86: Send a SIGTRAP for user icebp traps · a1e80faf
      Frederic Weisbecker 提交于
      Before we had a generic breakpoint layer, x86 used to send a
      sigtrap for any debug event that happened in userspace,
      except if it was caused by lazy dr7 switches.
      
      Currently we only send such signal for single step or breakpoint
      events.
      
      However, there are three other kind of debug exceptions:
      
      - debug register access detected: trigger an exception if the
        next instruction touches the debug registers. We don't use
        it.
      - task switch, but we don't use tss.
      - icebp/int01 trap. This instruction (0xf1) is undocumented and
        generates an int 1 exception. Unlike single step through TF
        flag, it doesn't set the single step origin of the exception
        in dr6.
      
      icebp then used to be reported in userspace using trap signals
      but this have been incidentally broken with the new breakpoint
      code. Reenable this. Since this is the only debug event that
      doesn't set anything in dr6, this is all we have to check.
      
      This fixes a regression in Wine where World Of Warcraft got broken
      as it uses this for software protection checks purposes. And
      probably other apps do.
      Reported-and-tested-by: NAlexandre Julliard <julliard@winehq.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: 2.6.33.x 2.6.34.x <stable@kernel.org>
      a1e80faf
  13. 25 6月, 2010 1 次提交
  14. 20 6月, 2010 1 次提交
  15. 19 6月, 2010 1 次提交
  16. 12 6月, 2010 2 次提交
  17. 11 6月, 2010 2 次提交
  18. 10 6月, 2010 3 次提交
  19. 09 6月, 2010 4 次提交
  20. 08 6月, 2010 2 次提交
  21. 03 6月, 2010 1 次提交
    • I
      xen: ensure timer tick is resumed even on CPU driving the resume · cd52e17e
      Ian Campbell 提交于
      The core suspend/resume code is run from stop_machine on CPU0 but
      parts of the suspend/resume machinery (including xen_arch_resume) are
      run on whichever CPU happened to schedule the xenwatch kernel thread.
      
      As part of the non-core resume code xen_arch_resume is called in order
      to restart the timer tick on non-boot processors. The boot processor
      itself is taken care of by core timekeeping code.
      
      xen_arch_resume uses smp_call_function which does not call the given
      function on the current processor. This means that we can end up with
      one CPU not receiving timer ticks if the xenwatch thread happened to
      be scheduled on CPU > 0.
      
      Use on_each_cpu instead of smp_call_function to ensure the timer tick
      is resumed everywhere.
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Acked-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Cc: Stable Kernel <stable@kernel.org> # .32.x
      cd52e17e
  22. 02 6月, 2010 1 次提交
    • B
      x86, smpboot: Fix cores per node printing on boot · 4adc8b71
      Borislav Petkov 提交于
      Percpu initialization happens now after booting the cores on the
      machine and this causes them all to be displayed as belonging to
      node 0:
      
      Jun  8 05:57:21 kepek kernel: [    0.106999] Booting Node   0,
      Processors  #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 Ok.
      
      Use early_cpu_to_node() to get the correct node of each core
      instead.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Cc: Mike Travis <travis@sgi.com>
      LKML-Reference: <20100601190455.GA14237@aftab>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4adc8b71
  23. 01 6月, 2010 2 次提交
  24. 31 5月, 2010 2 次提交
    • A
      x86/mm: Remove unused DBG() macro · e565813a
      Akinobu Mita 提交于
      DBG() macro for CONFIG_DEBUG_PER_CPU_MAPS is unused.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      LKML-Reference: <1274706291-13554-1-git-send-email-akinobu.mita@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e565813a
    • S
      perf_events: Fix event scheduling issues introduced by transactional API · 90151c35
      Stephane Eranian 提交于
      The transactional API patch between the generic and model-specific
      code introduced several important bugs with event scheduling, at
      least on X86. If you had pinned events, e.g., watchdog,  and were
      over-committing the PMU, you would get bogus counts. The bug was
      showing up on Intel CPU because events would move around more
      often that on AMD. But the problem also existed on AMD, though
      harder to expose.
      
      The issues were:
      
       - group_sched_in() was missing a cancel_txn() in the error path
      
       - cpuc->n_added was not properly maintained, leading to missing
         actions in hw_perf_enable(), i.e., n_running being 0. You cannot
         update n_added until you know the transaction has succeeded. In
         case of failed transaction n_added was not adjusted back.
      
       - in case of failed transactions, event_sched_out() was called
         and eventually invoked x86_disable_event() to touch the HW reg.
         But with transactions, on X86, event_sched_in() does not touch
         HW registers, it simply collects events into a list. Thus, you
         could end up calling x86_disable_event() on a counter which
         did not correspond to the current event when idx != -1.
      
      The patch modifies the generic and X86 code to avoid all those problems.
      
      First, we keep track of the number of events added last. In case the
      transaction fails, we substract them from n_added. This approach is
      necessary (as opposed to delaying updates to n_added) because not all
      event updates use the transaction API, e.g., single events.
      
      Second, we encapsulate the event_sched_in() and event_sched_out() in
      group_sched_in() inside the transaction. That makes the operations
      symmetrical and you can also detect that you are inside a transaction
      and skip the HW reg access by checking cpuc->group_flag.
      
      With this patch, you can now overcommit the PMU even with pinned
      system-wide events present and still get valid counts.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1274796225.5882.1389.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      90151c35