1. 14 6月, 2016 1 次提交
    • W
      KVM: Fix steal clock warp during guest CPU hotplug · 2348140d
      Wanpeng Li 提交于
      Sometimes, after CPU hotplug you can observe a spike in stolen time
      (100%) followed by the CPU being marked as 100% idle when it's actually
      busy with a CPU hog task.  The trace looks like the following:
      
       cpuhp/1-12    [001] d.h1   167.461657: account_process_tick: steal = 1291385514, prev_steal_time = 0
       cpuhp/1-12    [001] d.h1   167.461659: account_process_tick: steal_jiffies = 1291
        <idle>-0     [001] d.h1   167.462663: account_process_tick: steal = 18732255, prev_steal_time = 1291000000
        <idle>-0     [001] d.h1   167.462664: account_process_tick: steal_jiffies = 18446744072437
      
      The sudden decrease of "steal" causes steal_jiffies to underflow.
      The root cause is kvm_steal_time being reset to 0 after hot-plugging
      back in a CPU.  Instead, the preexisting value can be used, which is
      what the core scheduler code expects.
      
      John Stultz also reported a similar issue after guest S3.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1465813966-3116-2-git-send-email-wanpeng.li@hotmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2348140d
  2. 22 4月, 2016 1 次提交
    • L
      x86/paravirt: Remove paravirt_enabled() · 867fe800
      Luis R. Rodriguez 提交于
      Now that all previous paravirt_enabled() uses were replaced with proper
      x86 semantics by the previous patches we can remove the unused
      paravirt_enabled() mechanism.
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: andrew.cooper3@citrix.com
      Cc: andriy.shevchenko@linux.intel.com
      Cc: bigeasy@linutronix.de
      Cc: boris.ostrovsky@oracle.com
      Cc: david.vrabel@citrix.com
      Cc: ffainelli@freebox.fr
      Cc: george.dunlap@citrix.com
      Cc: glin@suse.com
      Cc: jlee@suse.com
      Cc: josh@joshtriplett.org
      Cc: julien.grall@linaro.org
      Cc: konrad.wilk@oracle.com
      Cc: kozerkov@parallels.com
      Cc: lenb@kernel.org
      Cc: lguest@lists.ozlabs.org
      Cc: linux-acpi@vger.kernel.org
      Cc: lv.zheng@intel.com
      Cc: matt@codeblueprint.co.uk
      Cc: mbizon@freebox.fr
      Cc: rjw@rjwysocki.net
      Cc: robert.moore@intel.com
      Cc: rusty@rustcorp.com.au
      Cc: tiwai@suse.de
      Cc: toshi.kani@hp.com
      Cc: xen-devel@lists.xensource.com
      Link: http://lkml.kernel.org/r/1460592286-300-15-git-send-email-mcgrof@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      867fe800
  3. 31 3月, 2016 1 次提交
  4. 22 3月, 2016 1 次提交
  5. 20 5月, 2015 1 次提交
  6. 11 5月, 2015 1 次提交
    • I
      locking/pvqspinlock: Rename QUEUED_SPINLOCK to QUEUED_SPINLOCKS · 62c7a1e9
      Ingo Molnar 提交于
      Valentin Rothberg reported that we use CONFIG_QUEUED_SPINLOCKS
      in arch/x86/kernel/paravirt_patch_32.c, while the symbol is
      called CONFIG_QUEUED_SPINLOCK. (Note the extra 'S')
      
      But the typo was natural: the proper English term for such
      a generic object would be 'queued spinlocks' - so rename
      this and related symbols accordingly to the plural form.
      Reported-by: NValentin Rothberg <valentinrothberg@gmail.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <Waiman.Long@hp.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      62c7a1e9
  7. 08 5月, 2015 1 次提交
    • W
      locking/pvqspinlock, x86: Enable PV qspinlock for KVM · bf0c7c34
      Waiman Long 提交于
      This patch adds the necessary KVM specific code to allow KVM to
      support the CPU halting and kicking operations needed by the queue
      spinlock PV code.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-11-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bf0c7c34
  8. 15 4月, 2015 1 次提交
  9. 18 2月, 2015 1 次提交
    • R
      x86/spinlocks/paravirt: Fix memory corruption on unlock · d6abfdb2
      Raghavendra K T 提交于
      Paravirt spinlock clears slowpath flag after doing unlock.
      As explained by Linus currently it does:
      
                      prev = *lock;
                      add_smp(&lock->tickets.head, TICKET_LOCK_INC);
      
                      /* add_smp() is a full mb() */
      
                      if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
                              __ticket_unlock_slowpath(lock, prev);
      
      which is *exactly* the kind of things you cannot do with spinlocks,
      because after you've done the "add_smp()" and released the spinlock
      for the fast-path, you can't access the spinlock any more.  Exactly
      because a fast-path lock might come in, and release the whole data
      structure.
      
      Linus suggested that we should not do any writes to lock after unlock(),
      and we can move slowpath clearing to fastpath lock.
      
      So this patch implements the fix with:
      
       1. Moving slowpath flag to head (Oleg):
          Unlocked locks don't care about the slowpath flag; therefore we can keep
          it set after the last unlock, and clear it again on the first (try)lock.
          -- this removes the write after unlock. note that keeping slowpath flag would
          result in unnecessary kicks.
          By moving the slowpath flag from the tail to the head ticket we also avoid
          the need to access both the head and tail tickets on unlock.
      
       2. use xadd to avoid read/write after unlock that checks the need for
          unlock_kick (Linus):
          We further avoid the need for a read-after-release by using xadd;
          the prev head value will include the slowpath flag and indicate if we
          need to do PV kicking of suspended spinners -- on modern chips xadd
          isn't (much) more expensive than an add + load.
      
      Result:
       setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
       benchmark overcommit %improve
       kernbench  1x           -0.13
       kernbench  2x            0.02
       dbench     1x           -1.77
       dbench     2x           -0.63
      
      [Jeremy: Hinted missing TICKET_LOCK_INC for kick]
      [Oleg: Moved slowpath flag to head, ticket_equals idea]
      [PeterZ: Added detailed changelog]
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Andrew Jones <drjones@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Fernando Luis Vázquez Cao <fernando_b1@lab.ntt.co.jp>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: Waiman Long <Waiman.Long@hp.com>
      Cc: a.ryabinin@samsung.com
      Cc: dave@stgolabs.net
      Cc: hpa@zytor.com
      Cc: jasowang@redhat.com
      Cc: jeremy@goop.org
      Cc: paul.gortmaker@windriver.com
      Cc: riel@redhat.com
      Cc: tglx@linutronix.de
      Cc: waiman.long@hp.com
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/20150215173043.GA7471@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d6abfdb2
  10. 10 12月, 2014 1 次提交
    • A
      x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit · 29fa6825
      Andy Lutomirski 提交于
      paravirt_enabled has the following effects:
      
       - Disables the F00F bug workaround warning.  There is no F00F bug
         workaround any more because Linux's standard IDT handling already
         works around the F00F bug, but the warning still exists.  This
         is only cosmetic, and, in any event, there is no such thing as
         KVM on a CPU with the F00F bug.
      
       - Disables 32-bit APM BIOS detection.  On a KVM paravirt system,
         there should be no APM BIOS anyway.
      
       - Disables tboot.  I think that the tboot code should check the
         CPUID hypervisor bit directly if it matters.
      
       - paravirt_enabled disables espfix32.  espfix32 should *not* be
         disabled under KVM paravirt.
      
      The last point is the purpose of this patch.  It fixes a leak of the
      high 16 bits of the kernel stack address on 32-bit KVM paravirt
      guests.  Fixes CVE-2014-8134.
      
      Cc: stable@vger.kernel.org
      Suggested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      29fa6825
  11. 14 10月, 2014 1 次提交
  12. 27 8月, 2014 1 次提交
    • C
      x86: Replace __get_cpu_var uses · 89cbc767
      Christoph Lameter 提交于
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      #define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86@kernel.org
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      89cbc767
  13. 22 5月, 2014 1 次提交
    • D
      x86: fix page fault tracing when KVM guest support enabled · 65a7f03f
      Dave Hansen 提交于
      I noticed on some of my systems that page fault tracing doesn't
      work:
      
      	cd /sys/kernel/debug/tracing
      	echo 1 > events/exceptions/enable
      	cat trace;
      	# nothing shows up
      
      I eventually traced it down to CONFIG_KVM_GUEST.  At least in a
      KVM VM, enabling that option breaks page fault tracing, and
      disabling fixes it.  I tried on some old kernels and this does
      not appear to be a regression: it never worked.
      
      There are two page-fault entry functions today.  One when tracing
      is on and another when it is off.  The KVM code calls do_page_fault()
      directly instead of calling the traced version:
      
      > dotraplinkage void __kprobes
      > do_async_page_fault(struct pt_regs *regs, unsigned long
      > error_code)
      > {
      >         enum ctx_state prev_state;
      >
      >         switch (kvm_read_and_reset_pf_reason()) {
      >         default:
      >                 do_page_fault(regs, error_code);
      >                 break;
      >         case KVM_PV_REASON_PAGE_NOT_PRESENT:
      
      I'm also having problems with the page fault tracing on bare
      metal (same symptom of no trace output).  I'm unsure if it's
      related.
      
      Steven had an alternative to this which has zero overhead when
      tracing is off where this includes the standard noops even when
      tracing is disabled.  I'm unconvinced that the extra complexity
      of his apporach:
      
      	http://lkml.kernel.org/r/20140508194508.561ed220@gandalf.local.home
      
      is worth it, expecially considering that the KVM code is already
      making page fault entry slower here.  This solution is
      dirt-simple.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86@kernel.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      65a7f03f
  14. 24 4月, 2014 1 次提交
    • M
      kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotation · 9326638c
      Masami Hiramatsu 提交于
      Use NOKPROBE_SYMBOL macro for protecting functions
      from kprobes instead of __kprobes annotation under
      arch/x86.
      
      This applies nokprobe_inline annotation for some cases,
      because NOKPROBE_SYMBOL() will inhibit inlining by
      referring the symbol address.
      
      This just folds a bunch of previous NOKPROBE_SYMBOL()
      cleanup patches for x86 to one patch.
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Link: http://lkml.kernel.org/r/20140417081814.26341.51656.stgit@ltc230.yrl.intra.hitachi.co.jp
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fernando Luis Vázquez Cao <fernando_b1@lab.ntt.co.jp>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Jonathan Lebon <jlebon@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Matt Fleming <matt.fleming@intel.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9326638c
  15. 22 2月, 2014 1 次提交
  16. 30 1月, 2014 3 次提交
  17. 09 11月, 2013 1 次提交
  18. 30 10月, 2013 1 次提交
    • T
      KVM: Fix modprobe failure for kvm_intel/kvm_amd · d780a312
      Tim Gardner 提交于
      The x86 specific kvm init creates a new conflicting
      debugfs directory which causes modprobe issues
      with kvm_intel and kvm_amd. For example,
      
      sudo modprobe kvm_amd
      modprobe: ERROR: could not insert 'kvm_amd': Bad address
      
      The simplest fix is to just rename the directory. The following
      KVM config options are set:
      
      CONFIG_KVM_GUEST=y
      CONFIG_KVM_DEBUG_FS=y
      CONFIG_HAVE_KVM=y
      CONFIG_HAVE_KVM_IRQCHIP=y
      CONFIG_HAVE_KVM_IRQ_ROUTING=y
      CONFIG_HAVE_KVM_EVENTFD=y
      CONFIG_KVM_APIC_ARCHITECTURE=y
      CONFIG_KVM_MMIO=y
      CONFIG_KVM_ASYNC_PF=y
      CONFIG_HAVE_KVM_MSI=y
      CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
      CONFIG_KVM=m
      CONFIG_KVM_INTEL=m
      CONFIG_KVM_AMD=m
      CONFIG_KVM_DEVICE_ASSIGNMENT=y
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NTim Gardner <tim.gardner@canonical.com>
      [Change debugfs directory name. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d780a312
  19. 15 10月, 2013 1 次提交
  20. 19 8月, 2013 1 次提交
  21. 14 8月, 2013 1 次提交
  22. 05 8月, 2013 1 次提交
    • J
      x86: Correctly detect hypervisor · 9df56f19
      Jason Wang 提交于
      We try to handle the hypervisor compatibility mode by detecting hypervisor
      through a specific order. This is not robust, since hypervisors may implement
      each others features.
      
      This patch tries to handle this situation by always choosing the last one in the
      CPUID leaves. This is done by letting .detect() return a priority instead of
      true/false and just re-using the CPUID leaf where the signature were found as
      the priority (or 1 if it was found by DMI). Then we can just pick hypervisor who
      has the highest priority. Other sophisticated detection method could also be
      implemented on top.
      
      Suggested by H. Peter Anvin and Paolo Bonzini.
      Acked-by: NK. Y. Srinivasan <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Doug Covelli <dcovelli@vmware.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dan Hecht <dhecht@vmware.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Link: http://lkml.kernel.org/r/1374742475-2485-4-git-send-email-jasowang@redhat.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      9df56f19
  23. 15 7月, 2013 1 次提交
    • P
      x86: delete __cpuinit usage from all x86 files · 148f9bb8
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/x86 uses of the __cpuinit macros from
      all C files.  x86 only had the one __CPUINIT used in assembly files,
      and it wasn't paired off with a .previous or a __FINIT, so we can
      delete it directly w/o any corresponding additional change there.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      148f9bb8
  24. 08 3月, 2013 2 次提交
    • F
      context_tracking: Restore correct previous context state on exception exit · 6c1e0256
      Frederic Weisbecker 提交于
      On exception exit, we restore the previous context tracking state based on
      the regs of the interrupted frame. Iff that frame is in user mode as
      stated by user_mode() helper, we restore the context tracking user mode.
      
      However there is a tiny chunck of low level arch code after we pass through
      user_enter() and until the CPU eventually resumes userspace.
      If an exception happens in this tiny area, exception_enter() correctly
      exits the context tracking user mode but exception_exit() won't restore
      it because of the value returned by user_mode(regs).
      
      As a result we may return to userspace with the wrong context tracking
      state.
      
      To fix this, change exception_enter() to return the context tracking state
      prior to its call and pass this saved state to exception_exit(). This restores
      the real context tracking state of the interrupted frame.
      
      (May be this patch was suggested to me, I don't recall exactly. If so,
      sorry for the missing credit).
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Mats Liljegren <mats.liljegren@enea.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      6c1e0256
    • F
      context_tracking: Move exception handling to generic code · 56dd9470
      Frederic Weisbecker 提交于
      Exceptions handling on context tracking should share common
      treatment: on entry we exit user mode if the exception triggered
      in that context. Then on exception exit we return to that previous
      context.
      
      Generalize this to avoid duplication across archs.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Mats Liljegren <mats.liljegren@enea.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      56dd9470
  25. 11 2月, 2013 1 次提交
  26. 26 1月, 2013 1 次提交
    • D
      x86, kvm: Fix kvm's use of __pa() on percpu areas · 5dfd486c
      Dave Hansen 提交于
      In short, it is illegal to call __pa() on an address holding
      a percpu variable.  This replaces those __pa() calls with
      slow_virt_to_phys().  All of the cases in this patch are
      in boot time (or CPU hotplug time at worst) code, so the
      slow pagetable walking in slow_virt_to_phys() is not expected
      to have a performance impact.
      
      The times when this actually matters are pretty obscure
      (certain 32-bit NUMA systems), but it _does_ happen.  It is
      important to keep KVM guests working on these systems because
      the real hardware is getting harder and harder to find.
      
      This bug manifested first by me seeing a plain hang at boot
      after this message:
      
      	CPU 0 irqstacks, hard=f3018000 soft=f301a000
      
      or, sometimes, it would actually make it out to the console:
      
      [    0.000000] BUG: unable to handle kernel paging request at ffffffff
      
      I eventually traced it down to the KVM async pagefault code.
      This can be worked around by disabling that code either at
      compile-time, or on the kernel command-line.
      
      The kvm async pagefault code was injecting page faults in
      to the guest which the guest misinterpreted because its
      "reason" was not being properly sent from the host.
      
      The guest passes a physical address of an per-cpu async page
      fault structure via an MSR to the host.  Since __pa() is
      broken on percpu data, the physical address it sent was
      bascially bogus and the host went scribbling on random data.
      The guest never saw the real reason for the page fault (it
      was injected by the host), assumed that the kernel had taken
      a _real_ page fault, and panic()'d.  The behavior varied,
      though, depending on what got corrupted by the bad write.
      Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20130122212435.4905663F@kernel.stglabs.ibm.comAcked-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      5dfd486c
  27. 24 1月, 2013 1 次提交
  28. 18 12月, 2012 1 次提交
    • L
      Add rcu user eqs exception hooks for async page fault · 9b132fbe
      Li Zhong 提交于
      This patch adds user eqs exception hooks for async page fault page not
      present code path, to exit the user eqs and re-enter it as necessary.
      
      Async page fault is different from other exceptions that it may be
      triggered from idle process, so we still need rcu_irq_enter() and
      rcu_irq_exit() to exit cpu idle eqs when needed, to protect the code
      that needs use rcu.
      
      As Frederic pointed out it would be safest and simplest to protect the
      whole kvm_async_pf_task_wait(). Otherwise, "we need to check all the
      code there deeply for potential RCU uses and ensure it will never be
      extended later to use RCU.".
      
      However, We'd better re-enter the cpu idle eqs if we get the exception
      in cpu idle eqs, by calling rcu_irq_exit() before native_safe_halt().
      
      So the patch does what Frederic suggested for rcu_irq_*() API usage
      here, except that I moved the rcu_irq_*() pair originally in
      do_async_page_fault() into kvm_async_pf_task_wait().
      
      That's because, I think it's better to have rcu_irq_*() pairs to be in
      one function ( rcu_irq_exit() after rcu_irq_enter() ), especially here,
      kvm_async_pf_task_wait() has other callers, which might cause
      rcu_irq_exit() be called without a matching rcu_irq_enter() before it,
      which is illegal if the cpu happens to be in rcu idle state.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      9b132fbe
  29. 29 11月, 2012 1 次提交
  30. 28 11月, 2012 1 次提交
  31. 23 10月, 2012 1 次提交
    • S
      KVM guest: exit idleness when handling KVM_PV_REASON_PAGE_NOT_PRESENT · c5e015d4
      Sasha Levin 提交于
      KVM_PV_REASON_PAGE_NOT_PRESENT kicks cpu out of idleness, but we haven't
      marked that spot as an exit from idleness.
      
      Not doing so can cause RCU warnings such as:
      
      [  732.788386] ===============================
      [  732.789803] [ INFO: suspicious RCU usage. ]
      [  732.790032] 3.7.0-rc1-next-20121019-sasha-00002-g6d8d02d-dirty #63 Tainted: G        W
      [  732.790032] -------------------------------
      [  732.790032] include/linux/rcupdate.h:738 rcu_read_lock() used illegally while idle!
      [  732.790032]
      [  732.790032] other info that might help us debug this:
      [  732.790032]
      [  732.790032]
      [  732.790032] RCU used illegally from idle CPU!
      [  732.790032] rcu_scheduler_active = 1, debug_locks = 1
      [  732.790032] RCU used illegally from extended quiescent state!
      [  732.790032] 2 locks held by trinity-child31/8252:
      [  732.790032]  #0:  (&rq->lock){-.-.-.}, at: [<ffffffff83a67528>] __schedule+0x178/0x8f0
      [  732.790032]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81152bde>] cpuacct_charge+0xe/0x200
      [  732.790032]
      [  732.790032] stack backtrace:
      [  732.790032] Pid: 8252, comm: trinity-child31 Tainted: G        W    3.7.0-rc1-next-20121019-sasha-00002-g6d8d02d-dirty #63
      [  732.790032] Call Trace:
      [  732.790032]  [<ffffffff8118266b>] lockdep_rcu_suspicious+0x10b/0x120
      [  732.790032]  [<ffffffff81152c60>] cpuacct_charge+0x90/0x200
      [  732.790032]  [<ffffffff81152bde>] ? cpuacct_charge+0xe/0x200
      [  732.790032]  [<ffffffff81158093>] update_curr+0x1a3/0x270
      [  732.790032]  [<ffffffff81158a6a>] dequeue_entity+0x2a/0x210
      [  732.790032]  [<ffffffff81158ea5>] dequeue_task_fair+0x45/0x130
      [  732.790032]  [<ffffffff8114ae29>] dequeue_task+0x89/0xa0
      [  732.790032]  [<ffffffff8114bb9e>] deactivate_task+0x1e/0x20
      [  732.790032]  [<ffffffff83a67c29>] __schedule+0x879/0x8f0
      [  732.790032]  [<ffffffff8117e20d>] ? trace_hardirqs_off+0xd/0x10
      [  732.790032]  [<ffffffff810a37a5>] ? kvm_async_pf_task_wait+0x1d5/0x2b0
      [  732.790032]  [<ffffffff83a67cf5>] schedule+0x55/0x60
      [  732.790032]  [<ffffffff810a37c4>] kvm_async_pf_task_wait+0x1f4/0x2b0
      [  732.790032]  [<ffffffff81139e50>] ? abort_exclusive_wait+0xb0/0xb0
      [  732.790032]  [<ffffffff81139c25>] ? prepare_to_wait+0x25/0x90
      [  732.790032]  [<ffffffff810a3a66>] do_async_page_fault+0x56/0xa0
      [  732.790032]  [<ffffffff83a6a6e8>] async_page_fault+0x28/0x30
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Acked-by: NGleb Natapov <gleb@redhat.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      c5e015d4
  32. 23 8月, 2012 1 次提交
  33. 16 8月, 2012 1 次提交
  34. 16 7月, 2012 1 次提交
  35. 12 7月, 2012 1 次提交
    • P
      KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check · fc73373b
      Prarit Bhargava 提交于
      While debugging I noticed that unlike all the other hypervisor code in the
      kernel, kvm does not have an entry for x86_hyper which is used in
      detect_hypervisor_platform() which results in a nice printk in the
      syslog.  This is only really a stub function but it
      does make kvm more consistent with the other hypervisors.
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Marcelo Tostatti <mtosatti@redhat.com>
      Cc: kvm@vger.kernel.org
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      fc73373b
  36. 25 6月, 2012 1 次提交
    • M
      KVM guest: guest side for eoi avoidance · ab9cf499
      Michael S. Tsirkin 提交于
      The idea is simple: there's a bit, per APIC, in guest memory,
      that tells the guest that it does not need EOI.
      Guest tests it using a single est and clear operation - this is
      necessary so that host can detect interrupt nesting - and if set, it can
      skip the EOI MSR.
      
      I run a simple microbenchmark to show exit reduction
      (note: for testing, need to apply follow-up patch
      'kvm: host side for eoi optimization' + a qemu patch
       I posted separately, on host):
      
      Before:
      
      Performance counter stats for 'sleep 1s':
      
                  47,357 kvm:kvm_entry                                                [99.98%]
                       0 kvm:kvm_hypercall                                            [99.98%]
                       0 kvm:kvm_hv_hypercall                                         [99.98%]
                   5,001 kvm:kvm_pio                                                  [99.98%]
                       0 kvm:kvm_cpuid                                                [99.98%]
                  22,124 kvm:kvm_apic                                                 [99.98%]
                  49,849 kvm:kvm_exit                                                 [99.98%]
                  21,115 kvm:kvm_inj_virq                                             [99.98%]
                       0 kvm:kvm_inj_exception                                        [99.98%]
                       0 kvm:kvm_page_fault                                           [99.98%]
                  22,937 kvm:kvm_msr                                                  [99.98%]
                       0 kvm:kvm_cr                                                   [99.98%]
                       0 kvm:kvm_pic_set_irq                                          [99.98%]
                       0 kvm:kvm_apic_ipi                                             [99.98%]
                  22,207 kvm:kvm_apic_accept_irq                                      [99.98%]
                  22,421 kvm:kvm_eoi                                                  [99.98%]
                       0 kvm:kvm_pv_eoi                                               [99.99%]
                       0 kvm:kvm_nested_vmrun                                         [99.99%]
                       0 kvm:kvm_nested_intercepts                                    [99.99%]
                       0 kvm:kvm_nested_vmexit                                        [99.99%]
                       0 kvm:kvm_nested_vmexit_inject                                    [99.99%]
                       0 kvm:kvm_nested_intr_vmexit                                    [99.99%]
                       0 kvm:kvm_invlpga                                              [99.99%]
                       0 kvm:kvm_skinit                                               [99.99%]
                      57 kvm:kvm_emulate_insn                                         [99.99%]
                       0 kvm:vcpu_match_mmio                                          [99.99%]
                       0 kvm:kvm_userspace_exit                                       [99.99%]
                       2 kvm:kvm_set_irq                                              [99.99%]
                       2 kvm:kvm_ioapic_set_irq                                       [99.99%]
                  23,609 kvm:kvm_msi_set_irq                                          [99.99%]
                       1 kvm:kvm_ack_irq                                              [99.99%]
                     131 kvm:kvm_mmio                                                 [99.99%]
                     226 kvm:kvm_fpu                                                  [100.00%]
                       0 kvm:kvm_age_page                                             [100.00%]
                       0 kvm:kvm_try_async_get_page                                    [100.00%]
                       0 kvm:kvm_async_pf_doublefault                                    [100.00%]
                       0 kvm:kvm_async_pf_not_present                                    [100.00%]
                       0 kvm:kvm_async_pf_ready                                       [100.00%]
                       0 kvm:kvm_async_pf_completed
      
             1.002100578 seconds time elapsed
      
      After:
      
       Performance counter stats for 'sleep 1s':
      
                  28,354 kvm:kvm_entry                                                [99.98%]
                       0 kvm:kvm_hypercall                                            [99.98%]
                       0 kvm:kvm_hv_hypercall                                         [99.98%]
                   1,347 kvm:kvm_pio                                                  [99.98%]
                       0 kvm:kvm_cpuid                                                [99.98%]
                   1,931 kvm:kvm_apic                                                 [99.98%]
                  29,595 kvm:kvm_exit                                                 [99.98%]
                  24,884 kvm:kvm_inj_virq                                             [99.98%]
                       0 kvm:kvm_inj_exception                                        [99.98%]
                       0 kvm:kvm_page_fault                                           [99.98%]
                   1,986 kvm:kvm_msr                                                  [99.98%]
                       0 kvm:kvm_cr                                                   [99.98%]
                       0 kvm:kvm_pic_set_irq                                          [99.98%]
                       0 kvm:kvm_apic_ipi                                             [99.99%]
                  25,953 kvm:kvm_apic_accept_irq                                      [99.99%]
                  26,132 kvm:kvm_eoi                                                  [99.99%]
                  26,593 kvm:kvm_pv_eoi                                               [99.99%]
                       0 kvm:kvm_nested_vmrun                                         [99.99%]
                       0 kvm:kvm_nested_intercepts                                    [99.99%]
                       0 kvm:kvm_nested_vmexit                                        [99.99%]
                       0 kvm:kvm_nested_vmexit_inject                                    [99.99%]
                       0 kvm:kvm_nested_intr_vmexit                                    [99.99%]
                       0 kvm:kvm_invlpga                                              [99.99%]
                       0 kvm:kvm_skinit                                               [99.99%]
                     284 kvm:kvm_emulate_insn                                         [99.99%]
                      68 kvm:vcpu_match_mmio                                          [99.99%]
                      68 kvm:kvm_userspace_exit                                       [99.99%]
                       2 kvm:kvm_set_irq                                              [99.99%]
                       2 kvm:kvm_ioapic_set_irq                                       [99.99%]
                  28,288 kvm:kvm_msi_set_irq                                          [99.99%]
                       1 kvm:kvm_ack_irq                                              [99.99%]
                     131 kvm:kvm_mmio                                                 [100.00%]
                     588 kvm:kvm_fpu                                                  [100.00%]
                       0 kvm:kvm_age_page                                             [100.00%]
                       0 kvm:kvm_try_async_get_page                                    [100.00%]
                       0 kvm:kvm_async_pf_doublefault                                    [100.00%]
                       0 kvm:kvm_async_pf_not_present                                    [100.00%]
                       0 kvm:kvm_async_pf_ready                                       [100.00%]
                       0 kvm:kvm_async_pf_completed
      
             1.002039622 seconds time elapsed
      
      We see that # of exits is almost halved.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      ab9cf499
  37. 06 5月, 2012 1 次提交