1. 10 10月, 2011 5 次提交
    • D
      x86, nmi: Add in logic to handle multiple events and unknown NMIs · b227e233
      Don Zickus 提交于
      Previous patches allow the NMI subsystem to process multipe NMI events
      in one NMI.  As previously discussed this can cause issues when an event
      triggered another NMI but is processed in the current NMI.  This causes the
      next NMI to go unprocessed and become an 'unknown' NMI.
      
      To handle this, we first have to flag whether or not the NMI handler handled
      more than one event or not.  If it did, then there exists a chance that
      the next NMI might be already processed.  Once the NMI is flagged as a
      candidate to be swallowed, we next look for a back-to-back NMI condition.
      
      This is determined by looking at the %rip from pt_regs.  If it is the same
      as the previous NMI, it is assumed the cpu did not have a chance to jump
      back into a non-NMI context and execute code and instead handled another NMI.
      
      If both of those conditions are true then we will swallow any unknown NMI.
      
      There still exists a chance that we accidentally swallow a real unknown NMI,
      but for now things seem better.
      
      An optimization has also been added to the nmi notifier rountine.  Because x86
      can latch up to one NMI while currently processing an NMI, we don't have to
      worry about executing _all_ the handlers in a standalone NMI.  The idea is
      if multiple NMIs come in, the second NMI will represent them.  For those
      back-to-back NMI cases, we have the potentail to drop NMIs.  Therefore only
      execute all the handlers in the second half of a detected back-to-back NMI.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1317409584-23662-5-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      b227e233
    • D
      x86, nmi: Wire up NMI handlers to new routines · 9c48f1c6
      Don Zickus 提交于
      Just convert all the files that have an nmi handler to the new routines.
      Most of it is straight forward conversion.  A couple of places needed some
      tweaking like kgdb which separates the debug notifier from the nmi handler
      and mce removes a call to notify_die.
      
      [Thanks to Ying for finding out the history behind that mce call
      
      https://lkml.org/lkml/2010/5/27/114
      
      And Boris responding that he would like to remove that call because of it
      
      https://lkml.org/lkml/2011/9/21/163]
      
      The things that get converted are the registeration/unregistration routines
      and the nmi handler itself has its args changed along with code removal
      to check which list it is on (most are on one NMI list except for kgdb
      which has both an NMI routine and an NMI Unknown routine).
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NCorey Minyard <minyard@acm.org>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Corey Minyard <minyard@acm.org>
      Cc: Jack Steiner <steiner@sgi.com>
      Link: http://lkml.kernel.org/r/1317409584-23662-4-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      9c48f1c6
    • D
      x86, nmi: Create new NMI handler routines · c9126b2e
      Don Zickus 提交于
      The NMI handlers used to rely on the notifier infrastructure.  This worked
      great until we wanted to support handling multiple events better.
      
      One of the key ideas to the nmi handling is to process _all_ the handlers for
      each NMI.  The reason behind this switch is because NMIs are edge triggered.
      If enough NMIs are triggered, then they could be lost because the cpu can
      only latch at most one NMI (besides the one currently being processed).
      
      In order to deal with this we have decided to process all the NMI handlers
      for each NMI.  This allows the handlers to determine if they recieved an
      event or not (the ones that can not determine this will be left to fend
      for themselves on the unknown NMI list).
      
      As a result of this change it is now possible to have an extra NMI that
      was destined to be received for an already processed event.  Because the
      event was processed in the previous NMI, this NMI gets dropped and becomes
      an 'unknown' NMI.  This of course will cause printks that scare people.
      
      However, we prefer to have extra NMIs as opposed to losing NMIs and as such
      are have developed a basic mechanism to catch most of them.  That will be
      a later patch.
      
      To accomplish this idea, I unhooked the nmi handlers from the notifier
      routines and created a new mechanism loosely based on doIRQ.  The reason
      for this is the notifier routines have a couple of shortcomings.  One we
      could't guarantee all future NMI handlers used NOTIFY_OK instead of
      NOTIFY_STOP.  Second, we couldn't keep track of the number of events being
      handled in each routine (most only handle one, perf can handle more than one).
      Third, I wanted to eventually display which nmi handlers are registered in
      the system in /proc/interrupts to help see who is generating NMIs.
      
      The patch below just implements the new infrastructure but doesn't wire it up
      yet (that is the next patch).  Its design is based on doIRQ structs and the
      atomic notifier routines.  So the rcu stuff in the patch isn't entirely untested
      (as the notifier routines have soaked it) but it should be double checked in
      case I copied the code wrong.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1317409584-23662-3-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      c9126b2e
    • D
      x86, nmi: Split out nmi from traps.c · 1d48922c
      Don Zickus 提交于
      The nmi stuff is changing a lot and adding more functionality.  Split it
      out from the traps.c file so it doesn't continue to pollute that file.
      
      This makes it easier to find and expand all the future nmi related work.
      
      No real functional changes here.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1317409584-23662-2-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      1d48922c
    • G
      perf, intel: Use GO/HO bits in perf-ctr · 144d31e6
      Gleb Natapov 提交于
      Intel does not have guest/host-only bit in perf counters like AMD
      does.  To support GO/HO bits KVM needs to switch EVENTSELn values
      (or PERF_GLOBAL_CTRL if available) at a guest entry. If a counter is
      configured to count only in a guest mode it stays disabled in a host,
      but VMX is configured to switch it to enabled value during guest entry.
      
      This patch adds GO/HO tracking to Intel perf code and provides interface
      for KVM to get a list of MSRs that need to be switched on a guest entry.
      
      Only cpus with architectural PMU (v1 or later) are supported with this
      patch.  To my knowledge there is not p6 models with VMX but without
      architectural PMU and p4 with VMX are rare and the interface is general
      enough to support them if need arise.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1317816084-18026-7-git-send-email-gleb@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      144d31e6
  2. 06 10月, 2011 1 次提交
  3. 28 9月, 2011 1 次提交
  4. 26 9月, 2011 1 次提交
  5. 21 9月, 2011 1 次提交
    • M
      x86/rtc: Don't recursively acquire rtc_lock · 47997d75
      Matt Fleming 提交于
      A deadlock was introduced on x86 in commit ef68c8f8 ("x86:
      Serialize EFI time accesses on rtc_lock") because efi_get_time()
      and friends can be called with rtc_lock already held by
      read_persistent_time(), e.g.:
      
       timekeeping_init()
          read_persistent_clock()     <-- acquire rtc_lock
              efi_get_time()
                  phys_efi_get_time() <-- acquire rtc_lock <DEADLOCK>
      
      To fix this let's push the locking down into the get_wallclock()
      and set_wallclock() implementations.  Only the clock
      implementations that access the x86 RTC directly need to acquire
      rtc_lock, so it makes sense to push the locking down into the
      rtc, vrtc and efi code.
      
      The virtualization implementations don't require rtc_lock to be
      held because they provide their own serialization.
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Acked-by: NJan Beulich <jbeulich@novell.com>
      Acked-by: Avi Kivity <avi@redhat.com> [for the virtualization aspect]
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: Josh Boyer <jwboyer@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      47997d75
  6. 14 9月, 2011 1 次提交
    • H
      x86, mce: Do not call del_timer_sync() in IRQ context · 9aaef96f
      Hidetoshi Seto 提交于
      del_timer_sync() can cause a deadlock when called in interrupt context.
      It is used with on_each_cpu() in some parts for sysfs files like bank*,
      check_interval, cmci_disabled and ignore_ce.
      
      However, use of on_each_cpu() results in calling the function passed
      as the argument in interrupt context. This causes a flood of nested
      warnings from del_timer_sync() (it runs on each CPU) caused even by a
      simple file access like:
      
      $ echo 300 > /sys/devices/system/machinecheck/machinecheck0/check_interval
      
      Fortunately, these MCE-specific files are rarely used and AFAIK only few
      MCE geeks experience this warning.
      
      To remove the warning, move timer deletion outside of the interrupt
      context.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      9aaef96f
  7. 31 8月, 2011 1 次提交
    • A
      x86, perf: Check that current->mm is alive before getting user callchain · 20afc60f
      Andrey Vagin 提交于
      An event may occur when an mm is already released.
      
      I added an event in dequeue_entity() and caught a panic with
      the following backtrace:
      
      [  434.421110] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
      [  434.421258] IP: [<ffffffff810464ac>] __get_user_pages_fast+0x9c/0x120
      ...
      [  434.421258] Call Trace:
      [  434.421258]  [<ffffffff8101ae81>] copy_from_user_nmi+0x51/0xf0
      [  434.421258]  [<ffffffff8109a0d5>] ? sched_clock_local+0x25/0x90
      [  434.421258]  [<ffffffff8101b048>] perf_callchain_user+0x128/0x170
      [  434.421258]  [<ffffffff811154cd>] ? __perf_event_header__init_id+0xed/0x100
      [  434.421258]  [<ffffffff81116690>] perf_prepare_sample+0x200/0x280
      [  434.421258]  [<ffffffff81118da8>] __perf_event_overflow+0x1b8/0x290
      [  434.421258]  [<ffffffff81065240>] ? tg_shares_up+0x0/0x670
      [  434.421258]  [<ffffffff8104fe1a>] ? walk_tg_tree+0x6a/0xb0
      [  434.421258]  [<ffffffff81118f44>] perf_swevent_overflow+0xc4/0xf0
      [  434.421258]  [<ffffffff81119150>] do_perf_sw_event+0x1e0/0x250
      [  434.421258]  [<ffffffff81119204>] perf_tp_event+0x44/0x70
      [  434.421258]  [<ffffffff8105701f>] ftrace_profile_sched_block+0xdf/0x110
      [  434.421258]  [<ffffffff8106121d>] dequeue_entity+0x2ad/0x2d0
      [  434.421258]  [<ffffffff810614ec>] dequeue_task_fair+0x1c/0x60
      [  434.421258]  [<ffffffff8105818a>] dequeue_task+0x9a/0xb0
      [  434.421258]  [<ffffffff810581e2>] deactivate_task+0x42/0xe0
      [  434.421258]  [<ffffffff814bc019>] thread_return+0x191/0x808
      [  434.421258]  [<ffffffff81098a44>] ? switch_task_namespaces+0x24/0x60
      [  434.421258]  [<ffffffff8106f4c4>] do_exit+0x464/0x910
      [  434.421258]  [<ffffffff8106f9c8>] do_group_exit+0x58/0xd0
      [  434.421258]  [<ffffffff8106fa57>] sys_exit_group+0x17/0x20
      [  434.421258]  [<ffffffff8100b202>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@kernel.org
      Link: http://lkml.kernel.org/r/1314693156-24131-1-git-send-email-avagin@openvz.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      20afc60f
  8. 27 8月, 2011 1 次提交
  9. 26 8月, 2011 2 次提交
  10. 14 8月, 2011 1 次提交
  11. 11 8月, 2011 2 次提交
  12. 09 8月, 2011 1 次提交
  13. 06 8月, 2011 1 次提交
  14. 05 8月, 2011 4 次提交
  15. 04 8月, 2011 2 次提交
  16. 27 7月, 2011 1 次提交
  17. 24 7月, 2011 2 次提交
  18. 22 7月, 2011 6 次提交
  19. 21 7月, 2011 4 次提交
  20. 16 7月, 2011 1 次提交
  21. 15 7月, 2011 1 次提交
    • O
      x86: Kill handle_signal()->set_fs() · 73d382de
      Oleg Nesterov 提交于
      handle_signal()->set_fs() has a nice comment which explains what
      set_fs() is, but it doesn't explain why it is needed and why it
      depends on CONFIG_X86_64.
      
      Afaics, the history of this confusion is:
      
      	1. I guess today nobody can explain why it was needed
      	   in arch/i386/kernel/signal.c, perhaps it was always
      	   wrong. This predates 2.4.0 kernel.
      
      	2. then it was copy-and-past'ed to the new x86_64 arch.
      
      	3. then it was removed from i386 (but not from x86_64)
      	   by b93b6ca3 "i386: remove unnecessary code".
      
      	4. then it was reintroduced under CONFIG_X86_64 when x86
      	   unified i386 and x86_64, because the patch above didn't
      	   touch x86_64.
      
      Remove it. ->addr_limit should be correct. Even if it was possible
      that it is wrong, it is too late to fix it after setup_rt_frame().
      
      Linus commented in:
      http://lkml.kernel.org/r/alpine.LFD.0.999.0707170902570.19166@woody.linux-foundation.org
      
      ... about the equivalent bit from i386:
      
      Heh. I think it's entirely historical.
      
      Please realize that the whole reason that function is called "set_fs()" is 
      that it literally used to set the %fs segment register, not 
      "->addr_limit".
      
      So I think the "set_fs(USER_DS)" is there _only_ to match the other
      
              regs->xds = __USER_DS;
              regs->xes = __USER_DS;
              regs->xss = __USER_DS;
              regs->xcs = __USER_CS;
      
      things, and never mattered. And now it matters even less, and has been 
      copied to all other architectures where it is just totally insane.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Link: http://lkml.kernel.org/r/20110710164424.GA20261@redhat.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      73d382de