1. 10 10月, 2011 6 次提交
    • R
      perf, x86: Implement IBS initialization · b7169166
      Robert Richter 提交于
      This patch implements IBS feature detection and initialzation. The
      code is shared between perf and oprofile. If IBS is available on the
      system for perf, a pmu is setup.
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1316597423-25723-3-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      b7169166
    • R
      perf, x86: Share IBS macros between perf and oprofile · ee5789db
      Robert Richter 提交于
      Moving IBS macros from oprofile to <asm/perf_event.h> to make it
      available to perf. No additional changes.
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1316597423-25723-2-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      ee5789db
    • D
      x86, nmi: Add in logic to handle multiple events and unknown NMIs · b227e233
      Don Zickus 提交于
      Previous patches allow the NMI subsystem to process multipe NMI events
      in one NMI.  As previously discussed this can cause issues when an event
      triggered another NMI but is processed in the current NMI.  This causes the
      next NMI to go unprocessed and become an 'unknown' NMI.
      
      To handle this, we first have to flag whether or not the NMI handler handled
      more than one event or not.  If it did, then there exists a chance that
      the next NMI might be already processed.  Once the NMI is flagged as a
      candidate to be swallowed, we next look for a back-to-back NMI condition.
      
      This is determined by looking at the %rip from pt_regs.  If it is the same
      as the previous NMI, it is assumed the cpu did not have a chance to jump
      back into a non-NMI context and execute code and instead handled another NMI.
      
      If both of those conditions are true then we will swallow any unknown NMI.
      
      There still exists a chance that we accidentally swallow a real unknown NMI,
      but for now things seem better.
      
      An optimization has also been added to the nmi notifier rountine.  Because x86
      can latch up to one NMI while currently processing an NMI, we don't have to
      worry about executing _all_ the handlers in a standalone NMI.  The idea is
      if multiple NMIs come in, the second NMI will represent them.  For those
      back-to-back NMI cases, we have the potentail to drop NMIs.  Therefore only
      execute all the handlers in the second half of a detected back-to-back NMI.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1317409584-23662-5-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      b227e233
    • D
      x86, nmi: Wire up NMI handlers to new routines · 9c48f1c6
      Don Zickus 提交于
      Just convert all the files that have an nmi handler to the new routines.
      Most of it is straight forward conversion.  A couple of places needed some
      tweaking like kgdb which separates the debug notifier from the nmi handler
      and mce removes a call to notify_die.
      
      [Thanks to Ying for finding out the history behind that mce call
      
      https://lkml.org/lkml/2010/5/27/114
      
      And Boris responding that he would like to remove that call because of it
      
      https://lkml.org/lkml/2011/9/21/163]
      
      The things that get converted are the registeration/unregistration routines
      and the nmi handler itself has its args changed along with code removal
      to check which list it is on (most are on one NMI list except for kgdb
      which has both an NMI routine and an NMI Unknown routine).
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NCorey Minyard <minyard@acm.org>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Corey Minyard <minyard@acm.org>
      Cc: Jack Steiner <steiner@sgi.com>
      Link: http://lkml.kernel.org/r/1317409584-23662-4-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      9c48f1c6
    • D
      x86, nmi: Create new NMI handler routines · c9126b2e
      Don Zickus 提交于
      The NMI handlers used to rely on the notifier infrastructure.  This worked
      great until we wanted to support handling multiple events better.
      
      One of the key ideas to the nmi handling is to process _all_ the handlers for
      each NMI.  The reason behind this switch is because NMIs are edge triggered.
      If enough NMIs are triggered, then they could be lost because the cpu can
      only latch at most one NMI (besides the one currently being processed).
      
      In order to deal with this we have decided to process all the NMI handlers
      for each NMI.  This allows the handlers to determine if they recieved an
      event or not (the ones that can not determine this will be left to fend
      for themselves on the unknown NMI list).
      
      As a result of this change it is now possible to have an extra NMI that
      was destined to be received for an already processed event.  Because the
      event was processed in the previous NMI, this NMI gets dropped and becomes
      an 'unknown' NMI.  This of course will cause printks that scare people.
      
      However, we prefer to have extra NMIs as opposed to losing NMIs and as such
      are have developed a basic mechanism to catch most of them.  That will be
      a later patch.
      
      To accomplish this idea, I unhooked the nmi handlers from the notifier
      routines and created a new mechanism loosely based on doIRQ.  The reason
      for this is the notifier routines have a couple of shortcomings.  One we
      could't guarantee all future NMI handlers used NOTIFY_OK instead of
      NOTIFY_STOP.  Second, we couldn't keep track of the number of events being
      handled in each routine (most only handle one, perf can handle more than one).
      Third, I wanted to eventually display which nmi handlers are registered in
      the system in /proc/interrupts to help see who is generating NMIs.
      
      The patch below just implements the new infrastructure but doesn't wire it up
      yet (that is the next patch).  Its design is based on doIRQ structs and the
      atomic notifier routines.  So the rcu stuff in the patch isn't entirely untested
      (as the notifier routines have soaked it) but it should be double checked in
      case I copied the code wrong.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1317409584-23662-3-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      c9126b2e
    • G
      perf, intel: Use GO/HO bits in perf-ctr · 144d31e6
      Gleb Natapov 提交于
      Intel does not have guest/host-only bit in perf counters like AMD
      does.  To support GO/HO bits KVM needs to switch EVENTSELn values
      (or PERF_GLOBAL_CTRL if available) at a guest entry. If a counter is
      configured to count only in a guest mode it stays disabled in a host,
      but VMX is configured to switch it to enabled value during guest entry.
      
      This patch adds GO/HO tracking to Intel perf code and provides interface
      for KVM to get a list of MSRs that need to be switched on a guest entry.
      
      Only cpus with architectural PMU (v1 or later) are supported with this
      patch.  To my knowledge there is not p6 models with VMX but without
      architectural PMU and p4 with VMX are rare and the interface is general
      enough to support them if need arise.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1317816084-18026-7-git-send-email-gleb@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      144d31e6
  2. 06 10月, 2011 1 次提交
  3. 16 9月, 2011 1 次提交
    • L
      asm alternatives: remove incorrect alignment notes · a7f934d4
      Linus Torvalds 提交于
      On x86-64, they were just wasteful: with the explicitly added (now
      unnecessary) padding, the size of the alternatives structure was 16
      bytes, and an alignment of 8 bytes didn't hurt much.
      
      However, it was still silly, since the natural size and alignment for
      the structure is actually just 12 bytes, 4-byte aligned since commit
      59e97e4d ("x86: Make alternative instruction pointers relative").
      So removing the padding, and removing the extra alignment is just a good
      idea.
      
      On x86-32, the alignment of 4 bytes was correct, but was incorrectly
      hardcoded as 8 bytes in <asm/alternative-asm.h>.  That header file had
      used to be an x86-64 only header file, but various unification efforts
      have made it be used for x86-32 too (ie the unification of rwlock and
      rwsem).
      
      That in turn caused x86-32 boot failures, because the extra alignment
      would result in random zero-filled words in the altinstructions section,
      causing oopses early at boot when doing alternative instruction
      replacement.
      
      So just remove all the alignment noise entirely.  It's wrong, and it's
      unnecessary.  The section itself is already properly aligned by the
      linker scripts, and all additions to the section had better be of the
      proper 12-byte format, keeping it aligned.  So if the align directive
      were to ever make a difference, that would be an indication of a serious
      bug to begin with.
      Reported-by: NWerner Landgraf <w.landgraf@ru.r>
      Acked-by: NAndrew Lutomirski <luto@mit.edu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a7f934d4
  4. 30 8月, 2011 1 次提交
    • D
      KVM: Fix instruction size issue in pvclock scaling · 3b217116
      Duncan Sands 提交于
      Commit de2d1a52 ("KVM: Fix register corruption in pvclock_scale_delta")
      introduced a mul instruction that may have only a memory operand; the
      assembler therefore cannot select the correct size:
      
         pvclock.s:229: Error: no instruction mnemonic suffix given and no register
      operands; can't size instruction
      
      In this example the assembler is:
      
               #APP
               mul -48(%rbp) ; shrd $32, %rdx, %rax
               #NO_APP
      
      A simple solution is to use mulq.
      Signed-off-by: NDuncan Sands <baldrick@free.fr>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      3b217116
  5. 27 8月, 2011 1 次提交
  6. 17 8月, 2011 1 次提交
    • J
      xen/x86: replace order-based range checking of M2P table by linear one · ccbcdf7c
      Jan Beulich 提交于
      The order-based approach is not only less efficient (requiring a shift
      and a compare, typical generated code looking like this
      
      	mov	eax, [machine_to_phys_order]
      	mov	ecx, eax
      	shr	ebx, cl
      	test	ebx, ebx
      	jnz	...
      
      whereas a direct check requires just a compare, like in
      
      	cmp	ebx, [machine_to_phys_nr]
      	jae	...
      
      ), but also slightly dangerous in the 32-on-64 case - the element
      address calculation can wrap if the next power of two boundary is
      sufficiently far away from the actual upper limit of the table, and
      hence can result in user space addresses being accessed (with it being
      unknown what may actually be mapped there).
      
      Additionally, the elimination of the mistaken use of fls() here (should
      have been __fls()) fixes a latent issue on x86-64 that would trigger
      if the code was run on a system with memory extending beyond the 44-bit
      boundary.
      
      CC: stable@kernel.org
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      [v1: Based on Jeremy's feedback]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ccbcdf7c
  7. 11 8月, 2011 2 次提交
  8. 05 8月, 2011 1 次提交
  9. 04 8月, 2011 2 次提交
  10. 27 7月, 2011 5 次提交
  11. 24 7月, 2011 4 次提交
  12. 23 7月, 2011 1 次提交
  13. 22 7月, 2011 4 次提交
  14. 21 7月, 2011 3 次提交
  15. 19 7月, 2011 1 次提交
  16. 15 7月, 2011 4 次提交
  17. 14 7月, 2011 2 次提交
    • G
      KVM guest: Add a pv_ops stub for steal time · 3c404b57
      Glauber Costa 提交于
      This patch adds a function pointer in one of the many paravirt_ops
      structs, to allow guests to register a steal time function. Besides
      a steal time function, we also declare two jump_labels. They will be
      used to allow the steal time code to be easily bypassed when not
      in use.
      Signed-off-by: NGlauber Costa <glommer@redhat.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Tested-by: NEric B Munson <emunson@mgebm.net>
      CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      CC: Peter Zijlstra <peterz@infradead.org>
      CC: Anthony Liguori <aliguori@us.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      3c404b57
    • G
      KVM: Steal time implementation · c9aaa895
      Glauber Costa 提交于
      To implement steal time, we need the hypervisor to pass the guest
      information about how much time was spent running other processes
      outside the VM, while the vcpu had meaningful work to do - halt
      time does not count.
      
      This information is acquired through the run_delay field of
      delayacct/schedstats infrastructure, that counts time spent in a
      runqueue but not running.
      
      Steal time is a per-cpu information, so the traditional MSR-based
      infrastructure is used. A new msr, KVM_MSR_STEAL_TIME, holds the
      memory area address containing information about steal time
      
      This patch contains the hypervisor part of the steal time infrasructure,
      and can be backported independently of the guest portion.
      
      [avi, yongjie: export delayacct_on, to avoid build failures in some configs]
      Signed-off-by: NGlauber Costa <glommer@redhat.com>
      Tested-by: NEric B Munson <emunson@mgebm.net>
      CC: Rik van Riel <riel@redhat.com>
      CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      CC: Peter Zijlstra <peterz@infradead.org>
      CC: Anthony Liguori <aliguori@us.ibm.com>
      Signed-off-by: NYongjie Ren <yongjie.ren@intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      c9aaa895