1. 02 8月, 2010 1 次提交
  2. 22 7月, 2010 2 次提交
  3. 21 7月, 2010 1 次提交
    • Y
      x86, numa: fix boot without RAM on node0 again · 9aebbdb6
      Yinghai Lu 提交于
      Commit e534c7c5 ("numa: x86_64: use generic percpu var
      numa_node_id() implementation") broke numa systems that don't have ram
      on node0 when MEMORY_HOTPLUG is enabled, because cpu_up() will call
      cpu_to_node() before per_cpu(numa_node) is setup for APs.
      
      When Node0 doesn't have RAM, on x86, cpus already round it to nearest
      node with RAM in x86_cpu_to_node_map.  and per_cpu(numa_node) is not set
      up until in c_init for APs.
      
      When later cpu_up() calling cpu_to_node() will get 0 again, and make it
      online even there is no RAM on node0.  so later all APs can not booted up,
      and later will have panic.
      
      [    1.611101] On node 0 totalpages: 0
      .........
      [    2.608558] On node 0 totalpages: 0
      [    2.612065] Brought up 1 CPUs
      [    2.615199] Total of 1 processors activated (3990.31 BogoMIPS).
      ...
         93.225341] calling  loop_init+0x0/0x1a4 @ 1
      [   93.229314] PERCPU: allocation failed, size=80 align=8, failed to populate
      [   93.246539] Pid: 1, comm: swapper Tainted: G        W   2.6.35-rc4-tip-yh-04371-gd64e6c4-dirty #354
      [   93.264621] Call Trace:
      [   93.266533]  [<ffffffff81125e43>] pcpu_alloc+0x83a/0x8e7
      [   93.270710]  [<ffffffff81125f15>] __alloc_percpu+0x10/0x12
      [   93.285849]  [<ffffffff8140786c>] alloc_disk_node+0x94/0x16d
      [   93.291811]  [<ffffffff81407956>] alloc_disk+0x11/0x13
      [   93.306157]  [<ffffffff81503e51>] loop_alloc+0xa7/0x180
      [   93.310538]  [<ffffffff8277ef48>] loop_init+0x9b/0x1a4
      [   93.324909]  [<ffffffff8277eead>] ? loop_init+0x0/0x1a4
      [   93.329650]  [<ffffffff810001f2>] do_one_initcall+0x57/0x136
      [   93.345197]  [<ffffffff827486d0>] kernel_init+0x184/0x20e
      [   93.348146]  [<ffffffff81034954>] kernel_thread_helper+0x4/0x10
      [   93.365194]  [<ffffffff81c7cc3c>] ? restore_args+0x0/0x30
      [   93.369305]  [<ffffffff8274854c>] ? kernel_init+0x0/0x20e
      [   93.386011]  [<ffffffff81034950>] ? kernel_thread_helper+0x0/0x10
      [   93.392047] loop: out of memory
      ...
      
      Try to assign per_cpu(numa_node) early
      
      [akpm@linux-foundation.org: tidy up code comment]
      Signed-off-by: NYinghai <yinghai@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Acked-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9aebbdb6
  4. 19 7月, 2010 2 次提交
  5. 17 7月, 2010 3 次提交
    • J
      x86, pci, mrst: Add extra sanity check in walking the PCI extended cap chain · f82c3d71
      Jacob Pan 提交于
      The fixed bar capability structure is searched in PCI extended
      configuration space.  We need to make sure there is a valid capability
      ID to begin with otherwise, the search code may stuck in a infinite
      loop which results in boot hang.  This patch adds additional check for
      cap ID 0, which is also invalid, and indicates end of chain.
      
      End of chain is supposed to have all fields zero, but that doesn't
      seem to always be the case in the field.
      Suggested-by: N"H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
      Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      LKML-Reference: <1279306706-27087-1-git-send-email-jacob.jun.pan@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      f82c3d71
    • Y
      x86: Fix x2apic preenabled system with kexec · fd19dce7
      Yinghai Lu 提交于
      Found one x2apic system kexec loop test failed
      when CONFIG_NMI_WATCHDOG=y (old) or CONFIG_LOCKUP_DETECTOR=y (current tip)
      
      first kernel can kexec second kernel, but second kernel can not kexec third one.
      
      it can be duplicated on another system with BIOS preenabled x2apic.
      First kernel can not kexec second kernel.
      
      It turns out, when kernel boot with pre-enabled x2apic, it will not execute
      disable_local_APIC on shutdown path.
      
      when init_apic_mappings() is called in setup_arch, it will skip setting of
      apic_phys when x2apic_mode is set. ( x2apic_mode is much early check_x2apic())
      Then later, disable_local_APIC() will bail out early because !apic_phys.
      
      So check !x2apic_mode in x2apic_mode in disable_local_APIC with !apic_phys.
      
      another solution could be updating init_apic_mappings() to set apic_phys even
      for preenabled x2apic system. Actually even for x2apic system, that lapic
      address is mapped already in early stage.
      
      BTW: is there any x2apic preenabled system with apicid of boot cpu > 255?
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4C3EB22B.3000701@kernel.org>
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: stable@kernel.org
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      fd19dce7
    • B
      PCI: fall back to original BIOS BAR addresses · 58c84eda
      Bjorn Helgaas 提交于
      If we fail to assign resources to a PCI BAR, this patch makes us try the
      original address from BIOS rather than leaving it disabled.
      
      Linux tries to make sure all PCI device BARs are inside the upstream
      PCI host bridge or P2P bridge apertures, reassigning BARs if necessary.
      Windows does similar reassignment.
      
      Before this patch, if we could not move a BAR into an aperture, we left
      the resource unassigned, i.e., at address zero.  Windows leaves such BARs
      at the original BIOS addresses, and this patch makes Linux do the same.
      
      This is a bit ugly because we disable the resource long before we try to
      reassign it, so we have to keep track of the BIOS BAR address somewhere.
      For lack of a better place, I put it in the struct pci_dev.
      
      I think it would be cleaner to attempt the assignment immediately when the
      claim fails, so we could easily remember the original address.  But we
      currently claim motherboard resources in the middle, after attempting to
      claim PCI resources and before assigning new PCI resources, and changing
      that is a fairly big job.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16263Reported-by: NAndrew <nitr0@seti.kr.ua>
      Tested-by: NAndrew <nitr0@seti.kr.ua>
      Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      58c84eda
  6. 15 7月, 2010 1 次提交
  7. 13 7月, 2010 1 次提交
  8. 08 7月, 2010 3 次提交
  9. 06 7月, 2010 1 次提交
    • A
      KVM: VMX: Fix host MSR_KERNEL_GS_BASE corruption · da38f438
      Avi Kivity 提交于
      enter_lmode() and exit_lmode() modify the guest's EFER.LMA before calling
      vmx_set_efer().  However, the latter function depends on the value of EFER.LMA
      to determine whether MSR_KERNEL_GS_BASE needs reloading, via
      vmx_load_host_state().  With EFER.LMA changing under its feet, it took the
      wrong choice and corrupted userspace's %gs.
      
      This causes 32-on-64 host userspace to fault.
      
      Fix not touching EFER.LMA; instead ask vmx_set_efer() to change it.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      da38f438
  10. 05 7月, 2010 2 次提交
    • P
      rbtree: Undo augmented trees performance damage and regression · b945d6b2
      Peter Zijlstra 提交于
      Reimplement augmented RB-trees without sprinkling extra branches
      all over the RB-tree code (which lives in the scheduler hot path).
      
      This approach is 'borrowed' from Fabio's BFQ implementation and
      relies on traversing the rebalance path after the RB-tree-op to
      correct the heap property for insertion/removal and make up for
      the damage done by the tree rotations.
      
      For insertion the rebalance path is trivially that from the new
      node upwards to the root, for removal it is that from the deepest
      node in the path from the to be removed node that will still
      be around after the removal.
      
      [ This patch also fixes a video driver regression reported by
        Ali Gholami Rudi - the memtype->subtree_max_end was updated
        incorrectly. ]
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Acked-by: NVenkatesh Pallipadi <venki@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NAli Gholami Rudi <ali@rudi.ir>
      Cc: Fabio Checconi <fabio@gandalf.sssup.it>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <1275414172.27810.27961.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b945d6b2
    • C
      perf, x86: P4 PMU -- redesign cache events · 39ef13a4
      Cyrill Gorcunov 提交于
      To support cache events we have reserved the low 6 bits in
      hw_perf_event::config (which is a part of CCCR register
      configuration actually).
      
      These bits represent Replay Event mertic enumerated in
      enum P4_PEBS_METRIC. The caller should not care about
      which exact bits should be set and how -- the caller
      just chooses one P4_PEBS_METRIC entity and puts it into
      the config. The kernel will track it and set appropriate
      additional MSR registers (metrics) when needed.
      
      The reason for this redesign was the PEBS enable bit, which
      should not be set until DS (and PEBS sampling) support will
      be implemented properly.
      
      TODO
      ====
      
       - PEBS sampling (note it's tricky and works with _one_ counter only
         so for HT machines it will be not that easy to handle both threads)
      
       - tracking of PEBS registers state, a user might need to turn
         PEBS off completely (ie no PEBS enable, no UOP_tag) but some
         other event may need it, such events clashes and should not
         run simultaneously, at moment we just don't support such events
      
       - eventually export user space bits in separate header which will
         allow user apps to configure raw events more conveniently.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: NLin Ming <ming.m.lin@intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1278295769.9540.15.camel@minggr.sh.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      39ef13a4
  11. 03 7月, 2010 1 次提交
  12. 01 7月, 2010 1 次提交
  13. 30 6月, 2010 1 次提交
    • F
      x86: Send a SIGTRAP for user icebp traps · a1e80faf
      Frederic Weisbecker 提交于
      Before we had a generic breakpoint layer, x86 used to send a
      sigtrap for any debug event that happened in userspace,
      except if it was caused by lazy dr7 switches.
      
      Currently we only send such signal for single step or breakpoint
      events.
      
      However, there are three other kind of debug exceptions:
      
      - debug register access detected: trigger an exception if the
        next instruction touches the debug registers. We don't use
        it.
      - task switch, but we don't use tss.
      - icebp/int01 trap. This instruction (0xf1) is undocumented and
        generates an int 1 exception. Unlike single step through TF
        flag, it doesn't set the single step origin of the exception
        in dr6.
      
      icebp then used to be reported in userspace using trap signals
      but this have been incidentally broken with the new breakpoint
      code. Reenable this. Since this is the only debug event that
      doesn't set anything in dr6, this is all we have to check.
      
      This fixes a regression in Wine where World Of Warcraft got broken
      as it uses this for software protection checks purposes. And
      probably other apps do.
      Reported-and-tested-by: NAlexandre Julliard <julliard@winehq.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: 2.6.33.x 2.6.34.x <stable@kernel.org>
      a1e80faf
  14. 29 6月, 2010 1 次提交
    • M
      kprobes/x86: Fix kprobes to skip prefixes correctly · 567a9fd8
      Masami Hiramatsu 提交于
      Fix resume_execution() and is_IF_modifier() to skip x86
      instruction prefixes correctly by using x86 instruction
      attribute.
      
      Without this fix, resume_execution() can't handle instructions
      which have non-REX prefixes (REX prefixes are skipped). This
      will cause unexpected kernel panic by hitting bad address when a
      kprobe hits on two-byte ret (e.g. "repz ret" generated for
      Athlon/K8 optimization), because it just checks "repz" and can't
      recognize the "ret" instruction.
      
      These prefixes can be found easily with x86 instruction
      attribute. This patch introduces skip_prefixes() and uses it in
      resume_execution() and is_IF_modifier() to skip prefixes.
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      LKML-Reference: <4C298A6E.8070609@hitachi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      567a9fd8
  15. 25 6月, 2010 3 次提交
    • D
      x86, Calgary: Increase max PHB number · 499a00e9
      Darrick J. Wong 提交于
      Newer systems (x3950M2) can have 48 PHBs per chassis and 8
      chassis, so bump the limits up and provide an explanation
      of the requirements for each class.
      Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
      Acked-by: NMuli Ben-Yehuda <muli@il.ibm.com>
      Cc: Corinna Schultz <cschultz@linux.vnet.ibm.com>
      Cc: <stable@kernel.org>
      LKML-Reference: <20100624212647.GI15515@tux1.beaverton.ibm.com>
      [ v2: Fixed build bug, added back PHBS_PER_CALGARY == 4 ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      499a00e9
    • F
      x86: Support for instruction breakpoints · f7809daf
      Frederic Weisbecker 提交于
      Instruction breakpoints need to have a specific length of 0 to
      be working. Bring this support but also take care the user is not
      trying to set an unsupported length, like a range breakpoint for
      example.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      f7809daf
    • F
      x86: Set resume bit before returning from breakpoint exception · 0c4519e8
      Frederic Weisbecker 提交于
      Instruction breakpoints trigger before the instruction executes,
      and returning back from the breakpoint handler brings us again
      to the instruction that breakpointed. This naturally bring to
      a breakpoint recursion.
      
      To solve this, x86 has the Resume Bit trick. When the cpu flags
      have the RF flag set, the next instruction won't trigger any
      instruction breakpoint, and once this instruction is executed,
      RF is cleared back.
      
      This let's us jump back to the instruction that triggered the
      breakpoint without recursion.
      
      Use this when an instruction breakpoint triggers.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      0c4519e8
  16. 20 6月, 2010 1 次提交
  17. 19 6月, 2010 1 次提交
  18. 18 6月, 2010 1 次提交
  19. 12 6月, 2010 2 次提交
  20. 11 6月, 2010 2 次提交
  21. 10 6月, 2010 3 次提交
  22. 09 6月, 2010 6 次提交