1. 10 9月, 2009 3 次提交
    • Y
      xen: use stronger barrier after unlocking lock · 2496afbf
      Yang Xiaowei 提交于
      We need to have a stronger barrier between releasing the lock and
      checking for any waiting spinners.  A compiler barrier is not sufficient
      because the CPU's ordering rules do not prevent the read xl->spinners
      from happening before the unlock assignment, as they are different
      memory locations.
      
      We need to have an explicit barrier to enforce the write-read ordering
      to different memory locations.
      
      Because of it, I can't bring up > 4 HVM guests on one SMP machine.
      
      [ Code and commit comments expanded -J ]
      
      [ Impact: avoid deadlock when using Xen PV spinlocks ]
      Signed-off-by: NYang Xiaowei <xiaowei.yang@intel.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      2496afbf
    • J
      xen: only enable interrupts while actually blocking for spinlock · 4d576b57
      Jeremy Fitzhardinge 提交于
      Where possible we enable interrupts while waiting for a spinlock to
      become free, in order to reduce big latency spikes in interrupt handling.
      
      However, at present if we manage to pick up the spinlock just before
      blocking, we'll end up holding the lock with interrupts enabled for a
      while.  This will cause a deadlock if we recieve an interrupt in that
      window, and the interrupt handler tries to take the lock too.
      
      Solve this by shrinking the interrupt-enabled region to just around the
      blocking call.
      
      [ Impact: avoid race/deadlock when using Xen PV spinlocks ]
      Reported-by: N"Yang, Xiaowei" <xiaowei.yang@intel.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      4d576b57
    • J
      xen: make -fstack-protector work under Xen · 577eebea
      Jeremy Fitzhardinge 提交于
      -fstack-protector uses a special per-cpu "stack canary" value.
      gcc generates special code in each function to test the canary to make
      sure that the function's stack hasn't been overrun.
      
      On x86-64, this is simply an offset of %gs, which is the usual per-cpu
      base segment register, so setting it up simply requires loading %gs's
      base as normal.
      
      On i386, the stack protector segment is %gs (rather than the usual kernel
      percpu %fs segment register).  This requires setting up the full kernel
      GDT and then loading %gs accordingly.  We also need to make sure %gs is
      initialized when bringing up secondary cpus too.
      
      To keep things consistent, we do the full GDT/segment register setup on
      both architectures.
      
      Because we need to avoid -fstack-protected code before setting up the GDT
      and because there's no way to disable it on a per-function basis, several
      files need to have stack-protector inhibited.
      
      [ Impact: allow Xen booting with stack-protector enabled ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      577eebea
  2. 06 9月, 2009 1 次提交
  3. 04 9月, 2009 1 次提交
    • D
      sparc64: Fix bootup with mcount in some configs. · bd4352ca
      David S. Miller 提交于
      Functions invoked early when booting up a cpu can't use
      tracing because mcount requires a valid 'current_thread_info()'
      and TLB mappings to be setup.
      
      The code path of sun4v_register_mondo_queues --> register_one_mondo
      is one such case.  sun4v_register_mondo_queues already has the
      necessary 'notrace' annotation, but register_one_mondo does not.
      
      Normally register_one_mondo is inlined so the bug doesn't trigger,
      but with some config/compiler combinations, it won't be so we
      must properly mark it notrace.
      
      While we're here, add 'notrace' annoations to prom_printf and
      prom_halt so that early error handling won't have the same problem.
      Reported-by: NAlexander Beregalov <a.beregalov@gmail.com>
      Reported-by: NLeif Sawyer <lsawyer@gci.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd4352ca
  4. 03 9月, 2009 4 次提交
    • D
      sparc64: Kill spurious NMI watchdog triggers by increasing limit to 30 seconds. · e6617c6e
      David S. Miller 提交于
      This is a compromise and a temporary workaround for bootup NMI
      watchdog triggers some people see with qla2xxx devices present.
      
      This happens when, for example:
      
      CPU 0 is in the driver init and looping submitting mailbox commands to
      load the firmware, then waiting for completion.
      
      CPU 1 is receiving the device interrupts.  CPU 1 is where the NMI
      watchdog triggers.
      
      CPU 0 is submitting mailbox commands fast enough that by the time CPU
      1 returns from the device interrupt handler, a new one is pending.
      This sequence runs for more than 5 seconds.
      
      The problematic case is CPU 1's timer interrupt running when the
      barrage of device interrupts begin.  Then we have:
      
      	timer interrupt
      	return for softirq checking
      	pending, thus enable interrupts
      
      		 qla2xxx interrupt
      		 return
      		 qla2xxx interrupt
      		 return
      		 ... 5+ seconds pass
      		 final qla2xxx interrupt for fw load
      		 return
      
      	run timer softirq
      	return
      
      At some point in the multi-second qla2xxx interrupt storm we trigger
      the NMI watchdog on CPU 1 from the NMI interrupt handler.
      
      The timer softirq, once we get back to running it, is smart enough to
      run the timer work enough times to make up for the missed timer
      interrupts.
      
      However, the NMI watchdogs (both x86 and sparc) use the timer
      interrupt count to notice the cpu is wedged.  But in the above
      scenerio we'll receive only one such timer interrupt even if we last
      all the way back to running the timer softirq.
      
      The default watchdog trigger point is only 5 seconds, which is pretty
      low (the softwatchdog triggers at 60 seconds).  So increase it to 30
      seconds for now.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6617c6e
    • P
      perf_counter/powerpc: Fix cache event codes for POWER7 · a3df6f7d
      Paul Mackerras 提交于
      I had the codes for L1 D-cache load accesses and misses swapped
      around, and the wrong codes for LL-cache accesses and misses.
      This corrects them.
      Reported-by: NCorey Ashford <cjashfor@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@kernel.org>
      LKML-Reference: <19103.8514.709300.585484@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a3df6f7d
    • J
      [IA64] fix csum_ipv6_magic() · 5afe18d2
      Jiri Bohac 提交于
      The 32-bit parameters (len and csum) of csum_ipv6_magic() are passed in 64-bit
      registers in2 and in4. The high order 32 bits of the registers were never
      cleared, and garbage was sometimes calculated into the checksum.
      
      Fix this by clearing the high order 32 bits of these registers.
      Signed-off-by: NJiri Bohac <jbohac@suse.cz>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      5afe18d2
    • L
      [IA64] Fix warning in dma-mapping.c · f2486f26
      Luck, Tony 提交于
      arch/ia64/kernel/dma-mapping.c:14: warning: control reaches end of non-void function
      arch/ia64/kernel/dma-mapping.c:14: warning: no return statement in function returning non-void
      
      This warning was introduced by commit: 390bd132
      	Add dma_debug_init() for ia64
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      f2486f26
  5. 29 8月, 2009 1 次提交
  6. 27 8月, 2009 6 次提交
  7. 26 8月, 2009 4 次提交
    • Y
      x86: Fix vSMP boot crash · 295594e9
      Yinghai Lu 提交于
      2.6.31-rc7 does not boot on vSMP systems:
      
      [    8.501108] CPU31: Thermal monitoring enabled (TM1)
      [    8.501127] CPU 31 MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8
      [    8.650254] CPU31: Intel(R) Xeon(R) CPU           E5540  @ 2.53GHz stepping 04
      [    8.710324] Brought up 32 CPUs
      [    8.713916] Total of 32 processors activated (162314.96 BogoMIPS).
      [    8.721489] ERROR: parent span is not a superset of domain->span
      [    8.727686] ERROR: domain->groups does not contain CPU0
      [    8.733091] ERROR: groups don't span domain->span
      [    8.737975] ERROR: domain->cpu_power not set
      [    8.742416]
      
      Ravikiran Thirumalai bisected it to:
      
      | commit 2759c328
      | x86: don't call read_apic_id if !cpu_has_apic
      
      The problem is that on vSMP systems the CPUID derived
      initial-APICIDs are overlapping - so we need to fall
      back on hard_smp_processor_id() which reads the local
      APIC.
      
      Both come from the hardware (influenced by firmware
      though) so it's a tough call which one to trust.
      
      Doing the quirk expresses the vSMP property properly
      and also does not affect other systems, so we go for
      this solution instead of a revert.
      Reported-and-Tested-by: NRavikiran Thirumalai <kiran@scalex86.org>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Shai Fultheim <shai@scalex86.org>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <4A944D3C.5030100@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      295594e9
    • H
      x86, xen: Initialize cx to suppress warning · 7adb4df4
      H. Peter Anvin 提交于
      Initialize cx before calling xen_cpuid(), in order to suppress the
      "may be used uninitialized in this function" warning.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      7adb4df4
    • J
      x86, xen: Suppress WP test on Xen · d560bc61
      Jeremy Fitzhardinge 提交于
      Xen always runs on CPUs which properly support WP enforcement in
      privileged mode, so there's no need to test for it.
      
      This also works around a crash reported by Arnd Hannemann, though I
      think its just a band-aid for that case.
      Reported-by: NArnd Hannemann <hannemann@nets.rwth-aachen.de>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      d560bc61
    • D
      sparc64: Validate linear D-TLB misses. · d8ed1d43
      David S. Miller 提交于
      When page alloc debugging is not enabled, we essentially accept any
      virtual address for linear kernel TLB misses.  But with kgdb, kernel
      address probing, and other facilities we can try to access arbitrary
      crap.
      
      So, make sure the address we miss on will translate to physical memory
      that actually exists.
      
      In order to make this work we have to embed the valid address bitmap
      into the kernel image.  And in order to make that less expensive we
      make an adjustment, in that the max physical memory address is
      decreased to "1 << 41", even on the chips that support a 42-bit
      physical address space.  We can do this because bit 41 indicates
      "I/O space" and thus covers non-memory ranges.
      
      The result of this is that:
      
      1) kpte_linear_bitmap shrinks from 2K to 1K in size
      
      2) we need 64K more for the valid address bitmap
      
      We can't let the valid address bitmap be dynamically allocated
      once we start using it to validate TLB misses, otherwise we have
      crazy issues to deal with wrt. recursive TLB misses and such.
      
      If we're in a TLB miss it could be the deepest trap level that's legal
      inside of the cpu.  So if we TLB miss referencing the bitmap, the cpu
      will be out of trap levels and enter RED state.
      
      To guard against out-of-range accesses to the bitmap, we have to check
      to make sure no bits in the physical address above bit 40 are set.  We
      could export and use last_valid_pfn for this check, but that's just an
      unnecessary extra memory reference.
      
      On the plus side of all this, since we load all of these translations
      into the special 4MB mapping TSB, and we check the TSB first for TLB
      misses, there should be absolutely no real cost for these new checks
      in the TLB miss path.
      
      Reported-by: heyongli@gmail.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8ed1d43
  8. 25 8月, 2009 2 次提交
    • J
      x86: Fix build with older binutils and consolidate linker script · c62e4320
      Jan Beulich 提交于
      binutils prior to 2.17 can't deal with the currently possible
      situation of a new segment following the per-CPU segment, but
      that new segment being empty - objcopy misplaces the .bss (and
      perhaps also the .brk) sections outside of any segment.
      
      However, the current ordering of sections really just appears
      to be the effect of cumulative unrelated changes; re-ordering
      things allows to easily guarantee that the segment following
      the per-CPU one is non-empty, and at once eliminates the need
      for the bogus data.init2 segment.
      
      Once touching this code, also use the various data section
      helper macros from include/asm-generic/vmlinux.lds.h.
      
      -v2: fix !SMP builds.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Cc: <sam@ravnborg.org>
      LKML-Reference: <4A94085D02000078000119A5@vpn.id2.novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c62e4320
    • A
      x86: Fix an incorrect argument of reserve_bootmem() · a6a06f7b
      Amerigo Wang 提交于
      This line looks suspicious, because if this is true, then the
      'flags' parameter of function reserve_bootmem_generic() will be
      unused when !CONFIG_NUMA. I don't think this is what we want.
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: akpm@linux-foundation.org
      LKML-Reference: <20090821083709.5098.52505.sendpatchset@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a6a06f7b
  9. 24 8月, 2009 3 次提交
  10. 22 8月, 2009 2 次提交
    • L
      x86: don't call '->send_IPI_mask()' with an empty mask · b04e6373
      Linus Torvalds 提交于
      As noted in 83d349f3 ("x86: don't send
      an IPI to the empty set of CPU's"), some APIC's will be very unhappy
      with an empty destination mask.  That commit added a WARN_ON() for that
      case, and avoided the resulting problem, but didn't fix the underlying
      reason for why those empty mask cases happened.
      
      This fixes that, by checking the result of 'cpumask_andnot()' of the
      current CPU actually has any other CPU's left in the set of CPU's to be
      sent a TLB flush, and not calling down to the IPI code if the mask is
      empty.
      
      The reason this started happening at all is that we started passing just
      the CPU mask pointers around in commit 4595f962 ("x86: change
      flush_tlb_others to take a const struct cpumask"), and when we did that,
      the cpumask was no longer thread-local.
      
      Before that commit, flush_tlb_mm() used to create it's own copy of
      'mm->cpu_vm_mask' and pass that copy down to the low-level flush
      routines after having tested that it was not empty.  But after changing
      it to just pass down the CPU mask pointer, the lower level TLB flush
      routines would now get a pointer to that 'mm->cpu_vm_mask', and that
      could still change - and become empty - after the test due to other
      CPU's having flushed their own TLB's.
      
      See
      
      	http://bugzilla.kernel.org/show_bug.cgi?id=13933
      
      for details.
      Tested-by: NThomas Björnell <thomas.bjornell@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b04e6373
    • L
      x86: don't send an IPI to the empty set of CPU's · 83d349f3
      Linus Torvalds 提交于
      The default_send_IPI_mask_logical() function uses the "flat" APIC mode
      to send an IPI to a set of CPU's at once, but if that set happens to be
      empty, some older local APIC's will apparently be rather unhappy.  So
      just warn if a caller gives us an empty mask, and ignore it.
      
      This fixes a regression in 2.6.30.x, due to commit 4595f962 ("x86:
      change flush_tlb_others to take a const struct cpumask"), documented
      here:
      
      	http://bugzilla.kernel.org/show_bug.cgi?id=13933
      
      which causes a silent lock-up.  It only seems to happen on PPro, P2, P3
      and Athlon XP cores.  Most developers sadly (or not so sadly, if you're
      a developer..) have more modern CPU's.  Also, on x86-64 we don't use the
      flat APIC mode, so it would never trigger there even if the APIC didn't
      like sending an empty IPI mask.
      Reported-by: NPavel Vilim <wylda@volny.cz>
      Reported-and-tested-by: NThomas Björnell <thomas.bjornell@gmail.com>
      Reported-and-tested-by: NMartin Rogge <marogge@onlinehome.de>
      Cc: Mike Travis <travis@sgi.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83d349f3
  11. 21 8月, 2009 1 次提交
  12. 20 8月, 2009 3 次提交
    • J
      xen: rearrange things to fix stackprotector · ce2eef33
      Jeremy Fitzhardinge 提交于
      Make sure the stack-protector segment registers are properly set up
      before calling any functions which may have stack-protection compiled
      into them.
      
      [ Impact: prevent Xen early-boot crash when stack-protector is enabled ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      ce2eef33
    • J
      x86: make sure load_percpu_segment has no stackprotector · 5416c266
      Jeremy Fitzhardinge 提交于
      load_percpu_segment() is used to set up the per-cpu segment registers,
      which are also used for -fstack-protector.  Make sure that the
      load_percpu_segment() function doesn't have stackprotector enabled.
      
      [ Impact: allow percpu setup before calling stack-protected functions ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      5416c266
    • S
      clockevent: Prevent dead lock on clockevents_lock · f833bab8
      Suresh Siddha 提交于
      Currently clockevents_notify() is called with interrupts enabled at
      some places and interrupts disabled at some other places.
      
      This results in a deadlock in this scenario.
      
      cpu A holds clockevents_lock in clockevents_notify() with irqs enabled
      cpu B waits for clockevents_lock in clockevents_notify() with irqs disabled
      cpu C doing set_mtrr() which will try to rendezvous of all the cpus.
      
      This will result in C and A come to the rendezvous point and waiting
      for B. B is stuck forever waiting for the spinlock and thus not
      reaching the rendezvous point.
      
      Fix the clockevents code so that clockevents_lock is taken with
      interrupts disabled and thus avoid the above deadlock.
      
      Also call lapic_timer_propagate_broadcast() on the destination cpu so
      that we avoid calling smp_call_function() in the clockevents notifier
      chain.
      
      This issue left us wondering if we need to change the MTRR rendezvous
      logic to use stop machine logic (instead of smp_call_function) or add
      a check in spinlock debug code to see if there are other spinlocks
      which gets taken under both interrupts enabled/disabled conditions.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com>
      Cc: "Brown Len" <len.brown@intel.com>
      LKML-Reference: <1250544899.2709.210.camel@sbs-t61.sc.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      f833bab8
  13. 19 8月, 2009 4 次提交
  14. 18 8月, 2009 5 次提交