1. 14 10月, 2013 1 次提交
  2. 05 9月, 2013 1 次提交
  3. 02 9月, 2013 4 次提交
  4. 26 8月, 2013 1 次提交
  5. 23 8月, 2013 2 次提交
  6. 20 8月, 2013 1 次提交
    • Y
      x86/ioapic/kcrash: Prevent crash_kexec() from deadlocking on ioapic_lock · 17405453
      Yoshihiro YUNOMAE 提交于
      Prevent crash_kexec() from deadlocking on ioapic_lock. When
      crash_kexec() is executed on a CPU, the CPU will take ioapic_lock
      in disable_IO_APIC(). So if the cpu gets an NMI while locking
      ioapic_lock, a deadlock will happen.
      
      In this patch, ioapic_lock is zapped/initialized before disable_IO_APIC().
      
      You can reproduce this deadlock the following way:
      
      1. Add mdelay(1000) after raw_spin_lock_irqsave() in
         native_ioapic_set_affinity()@arch/x86/kernel/apic/io_apic.c
      
         Although the deadlock can occur without this modification, it will increase
         the potential of the deadlock problem.
      
      2. Build and install the kernel
      
      3. Set up the OS which will run panic() and kexec when NMI is injected
          # echo "kernel.unknown_nmi_panic=1" >> /etc/sysctl.conf
          # vim /etc/default/grub
            add "nmi_watchdog=0 crashkernel=256M" in GRUB_CMDLINE_LINUX line
          # grub2-mkconfig
      
      4. Reboot the OS
      
      5. Run following command for each vcpu on the guest
          # while true; do echo <CPU num> > /proc/irq/<IO-APIC-edge or IO-APIC-fasteoi>/smp_affinitity; done;
         By running this command, cpus will get ioapic_lock for setting affinity.
      
      6. Inject NMI (push a dump button or execute 'virsh inject-nmi <domain>' if you
         use VM). After injecting NMI, panic() is called in an nmi-handler context.
         Then, kexec will normally run in panic(), but the operation will be stopped
         by deadlock on ioapic_lock in crash_kexec()->machine_crash_shutdown()->
         native_machine_crash_shutdown()->disable_IO_APIC()->clear_IO_APIC()->
         clear_IO_APIC_pin()->ioapic_read_entry().
      Signed-off-by: NYoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: yrl.pp-manager.tt@hitachi.com
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Link: http://lkml.kernel.org/r/20130820070107.28245.83806.stgit@yunodevelSigned-off-by: NIngo Molnar <mingo@kernel.org>
      17405453
  7. 19 8月, 2013 1 次提交
  8. 16 8月, 2013 3 次提交
  9. 14 8月, 2013 4 次提交
  10. 13 8月, 2013 2 次提交
    • T
      x86, microcode, AMD: Fix early microcode loading · 84516098
      Torsten Kaiser 提交于
      load_microcode_amd() (and the helper it is using) should not have an
      cpu parameter. The microcode loading does not depend on the CPU wrt the
      patches loaded since they will end up in a global list for all CPUs
      anyway.
      
      The change from cpu to x86family in load_microcode_amd()
      now allows to drop the code messing with cpu_data(cpu) from
      collect_cpu_info_amd_early(), which is wrong anyway because at that
      point the per-cpu cpu_info is not yet setup (These values would later be
      overwritten by smp_store_boot_cpu_info() / smp_store_cpu_info()).
      
      Fold the rest of collect_cpu_info_amd_early() into load_ucode_amd_ap(),
      because its only used at one place and without the cpuinfo_x86 accesses
      it was not much left.
      Signed-off-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
      [ Fengguang: build fix ]
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      [ Boris: adapt it to current tree. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      84516098
    • T
      x86, microcode, AMD: Make cpu_has_amd_erratum() use the correct struct cpuinfo_x86 · 8c6b79bb
      Torsten Kaiser 提交于
      cpu_has_amd_erratum() is buggy, because it uses the per-cpu cpu_info
      before it is filled by smp_store_boot_cpu_info() / smp_store_cpu_info().
      
      If early microcode loading is enabled its collect_cpu_info_amd_early()
      will fill ->x86 and so the fallback to boot_cpu_data is not used. But
      ->x86_vendor was not filled and is still X86_VENDOR_INTEL resulting in
      no errata fixes getting applied and my system hangs on boot.
      
      Using cpu_info in cpu_has_amd_erratum() is wrong anyway: its only
      caller init_amd() will have a struct cpuinfo_x86 as parameter and the
      set_cpu_bug() that is controlled by cpu_has_amd_erratum() also only uses
      that struct.
      
      So pass the struct cpuinfo_x86 from init_amd() to cpu_has_amd_erratum()
      and the broken fallback can be dropped.
      
      [ Boris: Drop WARN_ON() since we're called only from init_amd() ]
      Signed-off-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      8c6b79bb
  11. 12 8月, 2013 2 次提交
  12. 09 8月, 2013 3 次提交
    • J
      x86, ticketlock: Add slowpath logic · 96f853ea
      Jeremy Fitzhardinge 提交于
      Maintain a flag in the LSB of the ticket lock tail which indicates
      whether anyone is in the lock slowpath and may need kicking when
      the current holder unlocks.  The flags are set when the first locker
      enters the slowpath, and cleared when unlocking to an empty queue (ie,
      no contention).
      
      In the specific implementation of lock_spinning(), make sure to set
      the slowpath flags on the lock just before blocking.  We must do
      this before the last-chance pickup test to prevent a deadlock
      with the unlocker:
      
      Unlocker			Locker
      				test for lock pickup
      					-> fail
      unlock
      test slowpath
      	-> false
      				set slowpath flags
      				block
      
      Whereas this works in any ordering:
      
      Unlocker			Locker
      				set slowpath flags
      				test for lock pickup
      					-> fail
      				block
      unlock
      test slowpath
      	-> true, kick
      
      If the unlocker finds that the lock has the slowpath flag set but it is
      actually uncontended (ie, head == tail, so nobody is waiting), then it
      clears the slowpath flag.
      
      The unlock code uses a locked add to update the head counter.  This also
      acts as a full memory barrier so that its safe to subsequently
      read back the slowflag state, knowing that the updated lock is visible
      to the other CPUs.  If it were an unlocked add, then the flag read may
      just be forwarded from the store buffer before it was visible to the other
      CPUs, which could result in a deadlock.
      
      Unfortunately this means we need to do a locked instruction when
      unlocking with PV ticketlocks.  However, if PV ticketlocks are not
      enabled, then the old non-locked "add" is the only unlocking code.
      
      Note: this code relies on gcc making sure that unlikely() code is out of
      line of the fastpath, which only happens when OPTIMIZE_SIZE=n.  If it
      doesn't the generated code isn't too bad, but its definitely suboptimal.
      
      Thanks to Srivatsa Vaddagiri for providing a bugfix to the original
      version of this change, which has been folded in.
      Thanks to Stephan Diestelhorst for commenting on some code which relied
      on an inaccurate reading of the x86 memory ordering rules.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Link: http://lkml.kernel.org/r/1376058122-8248-11-git-send-email-raghavendra.kt@linux.vnet.ibm.comSigned-off-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Stephan Diestelhorst <stephan.diestelhorst@amd.com>
      Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      96f853ea
    • J
      x86, pvticketlock: Use callee-save for lock_spinning · 354714dd
      Jeremy Fitzhardinge 提交于
      Although the lock_spinning calls in the spinlock code are on the
      uncommon path, their presence can cause the compiler to generate many
      more register save/restores in the function pre/postamble, which is in
      the fast path.  To avoid this, convert it to using the pvops callee-save
      calling convention, which defers all the save/restores until the actual
      function is called, keeping the fastpath clean.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Link: http://lkml.kernel.org/r/1376058122-8248-8-git-send-email-raghavendra.kt@linux.vnet.ibm.comReviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Tested-by: NAttilio Rao <attilio.rao@citrix.com>
      Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      354714dd
    • J
      x86, spinlock: Replace pv spinlocks with pv ticketlocks · 545ac138
      Jeremy Fitzhardinge 提交于
      Rather than outright replacing the entire spinlock implementation in
      order to paravirtualize it, keep the ticket lock implementation but add
      a couple of pvops hooks on the slow patch (long spin on lock, unlocking
      a contended lock).
      
      Ticket locks have a number of nice properties, but they also have some
      surprising behaviours in virtual environments.  They enforce a strict
      FIFO ordering on cpus trying to take a lock; however, if the hypervisor
      scheduler does not schedule the cpus in the correct order, the system can
      waste a huge amount of time spinning until the next cpu can take the lock.
      
      (See Thomas Friebel's talk "Prevent Guests from Spinning Around"
      http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)
      
      To address this, we add two hooks:
       - __ticket_spin_lock which is called after the cpu has been
         spinning on the lock for a significant number of iterations but has
         failed to take the lock (presumably because the cpu holding the lock
         has been descheduled).  The lock_spinning pvop is expected to block
         the cpu until it has been kicked by the current lock holder.
       - __ticket_spin_unlock, which on releasing a contended lock
         (there are more cpus with tail tickets), it looks to see if the next
         cpu is blocked and wakes it if so.
      
      When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
      functions causes all the extra code to go away.
      
      Results:
      =======
      setup: 32 core machine with 32 vcpu KVM guest (HT off)  with 8GB RAM
      base = 3.11-rc
      patched = base + pvspinlock V12
      
      +-----------------+----------------+--------+
       dbench (Throughput in MB/sec. Higher is better)
      +-----------------+----------------+--------+
      |   base (stdev %)|patched(stdev%) | %gain  |
      +-----------------+----------------+--------+
      | 15035.3   (0.3) |15150.0   (0.6) |   0.8  |
      |  1470.0   (2.2) | 1713.7   (1.9) |  16.6  |
      |   848.6   (4.3) |  967.8   (4.3) |  14.0  |
      |   652.9   (3.5) |  685.3   (3.7) |   5.0  |
      +-----------------+----------------+--------+
      
      pvspinlock shows benefits for overcommit ratio > 1 for PLE enabled cases,
      and undercommits results are flat
      Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Link: http://lkml.kernel.org/r/1376058122-8248-2-git-send-email-raghavendra.kt@linux.vnet.ibm.comReviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Tested-by: NAttilio Rao <attilio.rao@citrix.com>
      [ Raghavendra: Changed SPIN_THRESHOLD, fixed redefinition of arch_spinlock_t]
      Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      545ac138
  13. 07 8月, 2013 10 次提交
  14. 05 8月, 2013 2 次提交
    • J
      x86: Correctly detect hypervisor · 9df56f19
      Jason Wang 提交于
      We try to handle the hypervisor compatibility mode by detecting hypervisor
      through a specific order. This is not robust, since hypervisors may implement
      each others features.
      
      This patch tries to handle this situation by always choosing the last one in the
      CPUID leaves. This is done by letting .detect() return a priority instead of
      true/false and just re-using the CPUID leaf where the signature were found as
      the priority (or 1 if it was found by DMI). Then we can just pick hypervisor who
      has the highest priority. Other sophisticated detection method could also be
      implemented on top.
      
      Suggested by H. Peter Anvin and Paolo Bonzini.
      Acked-by: NK. Y. Srinivasan <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Doug Covelli <dcovelli@vmware.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dan Hecht <dhecht@vmware.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Link: http://lkml.kernel.org/r/1374742475-2485-4-git-send-email-jasowang@redhat.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      9df56f19
    • V
      perf/x86: Fix intel QPI uncore event definitions · c9601247
      Vince Weaver 提交于
      John McCalpin reports that the "drs_data" and "ncb_data" QPI
      uncore events are missing the "extra bit" and always return zero
      values unless the bit is properly set.
      
      More details from him:
      
       According to the Xeon E5-2600 Product Family Uncore Performance
       Monitoring Guide, Table 2-94, about 1/2 of the QPI Link Layer events
       (including the ones that "perf" calls "drs_data" and "ncb_data") require
       that the "extra bit" be set.
      
       This was confusing for a while -- a note at the bottom of page 94 says
       that the "extra bit" is bit 16 of the control register.
       Unfortunately, Table 2-86 clearly says that bit 16 is reserved and must
       be zero.  Looking around a bit, I found that bit 21 appears to be the
       correct "extra bit", and further investigation shows that "perf" actually
       agrees with me:
      	[root@c560-003.stampede]# cat /sys/bus/event_source/devices/uncore_qpi_0/format/event
      	config:0-7,21
      
       So the command
      	# perf -e "uncore_qpi_0/event=drs_data/"
       Is the same as
      	# perf -e "uncore_qpi_0/event=0x02,umask=0x08/"
       While it should be
      	# perf -e "uncore_qpi_0/event=0x102,umask=0x08/"
      
       I confirmed that this last version gives results that agree with the
       amount of data that I expected the STREAM benchmark to move across the QPI
       link in the second (cross-chip) test of the original script.
      Reported-by: NJohn McCalpin <mccalpin@tacc.utexas.edu>
      Signed-off-by: NVince Weaver <vincent.weaver@maine.edu>
      Cc: zheng.z.yan@intel.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: <stable@kernel.org>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1308021037280.26119@vincent-weaver-1.um.maine.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c9601247
  15. 03 8月, 2013 2 次提交
    • D
      x86: sysfb: move EFI quirks from efifb to sysfb · 2995e506
      David Herrmann 提交于
      The EFI FB quirks from efifb.c are useful for simple-framebuffer devices
      as well. Apply them by default so we can convert efifb.c to use
      efi-framebuffer platform devices.
      Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Link: http://lkml.kernel.org/r/1375445127-15480-5-git-send-email-dh.herrmann@gmail.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      2995e506
    • D
      x86: provide platform-devices for boot-framebuffers · e3263ab3
      David Herrmann 提交于
      The current situation regarding boot-framebuffers (VGA, VESA/VBE, EFI) on
      x86 causes troubles when loading multiple fbdev drivers. The global
      "struct screen_info" does not provide any state-tracking about which
      drivers use the FBs. request_mem_region() theoretically works, but
      unfortunately vesafb/efifb ignore it due to quirks for broken boards.
      
      Avoid this by creating a platform framebuffer devices with a pointer
      to the "struct screen_info" as platform-data. Drivers can now create
      platform-drivers and the driver-core will refuse multiple drivers being
      active simultaneously.
      
      We keep the screen_info available for backwards-compatibility. Drivers
      can be converted in follow-up patches.
      
      Different devices are created for VGA/VESA/EFI FBs to allow multiple
      drivers to be loaded on distro kernels. We create:
       - "vesa-framebuffer" for VBE/VESA graphics FBs
       - "efi-framebuffer" for EFI FBs
       - "platform-framebuffer" for everything else
      This allows to load vesafb, efifb and others simultaneously and each
      picks up only the supported FB types.
      
      Apart from platform-framebuffer devices, this also introduces a
      compatibility option for "simple-framebuffer" drivers which recently got
      introduced for OF based systems. If CONFIG_X86_SYSFB is selected, we
      try to match the screen_info against a simple-framebuffer supported
      format. If we succeed, we create a "simple-framebuffer" device instead
      of a platform-framebuffer.
      This allows to reuse the simplefb.c driver across architectures and also
      to introduce a SimpleDRM driver. There is no need to have vesafb.c,
      efifb.c, simplefb.c and more just to have architecture specific quirks
      in their setup-routines.
      
      Instead, we now move the architecture specific quirks into x86-setup and
      provide a generic simple-framebuffer. For backwards-compatibility (if
      strange formats are used), we still allow vesafb/efifb to be loaded
      simultaneously and pick up all remaining devices.
      Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Link: http://lkml.kernel.org/r/1375445127-15480-4-git-send-email-dh.herrmann@gmail.comTested-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      e3263ab3
  16. 31 7月, 2013 1 次提交