1. 25 2月, 2009 8 次提交
    • H
      x86, mce, cmci: remove incorrect __cpuinit/__cpuexit annotations · df20e2eb
      H. Peter Anvin 提交于
      Impact: Bug fix on UP
      
      The MCE code is reinitialized from resume, so we can't use
      __cpuinit/__cpuexit for most of the code.  Remove those annotations
      for anything downstream of mce_init().
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      df20e2eb
    • A
      x86, mce, cmci: add CMCI support · 88ccbedd
      Andi Kleen 提交于
      Impact: Major new feature
      
      Intel CMCI (Corrected Machine Check Interrupt) is a new
      feature on Nehalem CPUs. It allows the CPU to trigger
      interrupts on corrected events, which allows faster
      reaction to them instead of with the traditional
      polling timer.
      
      Also use CMCI to discover shared banks. Machine check banks
      can be shared by CPU threads or even cores. Using the CMCI enable
      bit it is possible to detect the fact that another CPU already
      saw a specific bank. Use this to assign shared banks only
      to one CPU to avoid reporting duplicated events.
      
      On CPU hot unplug bank sharing is re discovered. This is done
      using a thread that cycles through all the CPUs.
      
      To avoid races between the poller and CMCI we only poll
      for banks that are not CMCI capable and only check CMCI
      owned banks on a interrupt.
      
      The shared banks ownership information is currently only used for
      CMCI interrupts, not polled banks.
      
      The sharing discovery code follows the algorithm recommended in the
      IA32 SDM Vol3a 14.5.2.1
      
      The CMCI interrupt handler just calls the machine check poller to
      pick up the machine check event that caused the interrupt.
      
      I decided not to implement a separate threshold event like
      the AMD version has, because the threshold is always one currently
      and adding another event didn't seem to add any value.
      
      Some code inspired by Yunhong Jiang's Xen implementation,
      which was in term inspired by a earlier CMCI implementation
      by me.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      88ccbedd
    • A
      x86, mce, cmci: define MSR names and fields for new CMCI registers · 03195c6b
      Andi Kleen 提交于
      Impact: New register definitions only
      
      CMCI means support for raising an interrupt on a corrected machine
      check event instead of having to poll for it. It's a new feature in
      Intel Nehalem CPUs available on some machine check banks.
      
      For details see the IA32 SDM Vol3a 14.5
      
      Define the registers for it as a preparation for further patches.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      03195c6b
    • A
      x86, mce, cmci: use polled banks bitmap in machine check poller · ee031c31
      Andi Kleen 提交于
      Define a per cpu bitmap that contains the banks polled by the machine
      check poller. This is needed for the CMCI code in the next patches
      to be able to disable polling on specific banks.
      
      The bank by default contains all banks, so there is no behaviour
      change. Only future code will remove some banks from the polling
      set.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ee031c31
    • A
      x86, mce: replace machine check events logged interval with ratelimit · 8457c84d
      Andi Kleen 提交于
      Impact: behavior change, use common code
      
      Use a standard leaky bucket ratelimit for the machine check
      warning print interval instead of waiting every check_interval.
      Also decrease the limit to twice per minute.
      This interacts better with threshold interrupts because
      they can happen more often than check_interval.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      8457c84d
    • A
      x86, mce, cmci: avoid potential reentry of threshold interrupt · f9695df4
      Andi Kleen 提交于
      Impact: minor bugfix
      
      The threshold handler on AMD (and soon on Intel) could be theoretically
      reentered by the hardware. This could lead to corrupted events
      because the machine check poll code assumes it is not reentered.
      
      Move the APIC ACK to the end of the interrupt handler to let
      the hardware avoid that.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      f9695df4
    • A
      x86, mce, cmci: factor out threshold interrupt handler · b2762686
      Andi Kleen 提交于
      Impact: cleanup; preparation for feature
      
      The mce_amd_64 code has an own private MC threshold vector with an own
      interrupt handler. Since Intel needs a similar handler
      it makes sense to share the vector because both can not
      be active at the same time.
      
      I factored the common APIC handler code into a separate file which can
      be used by both the Intel or AMD MC code.
      
      This is needed for the next patch which adds an Intel specific
      CMCI handler.
      
      This patch should be a nop for AMD, it just moves some code
      around.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      b2762686
    • A
      x86, mce, cmci: export MAX_NR_BANKS · 41fdff32
      Andi Kleen 提交于
      Impact: Cleanup (code movement)
      
      Move MAX_NR_BANKS into mce.h because it's needed there
      for followup patches.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      41fdff32
  2. 24 2月, 2009 1 次提交
  3. 23 2月, 2009 2 次提交
  4. 22 2月, 2009 2 次提交
  5. 21 2月, 2009 1 次提交
    • H
      x86, mce: remove incorrect __cpuinit for mce_cpu_features() · cc3ca220
      H. Peter Anvin 提交于
      Impact: Bug fix on UP
      
      Checkin 6ec68bff:
          x86, mce: reinitialize per cpu features on resume
      
      introduced a call to mce_cpu_features() in the resume path, in order
      for the MCE machinery to get properly reinitialized after a resume.
      However, this function (and its successors) was flagged __cpuinit,
      which becomes __init on UP configurations (on SMP suspend/resume
      requires CPU hotplug and so this would not be seen.)
      
      Remove the offending __cpuinit annotations for mce_cpu_features() and
      its successor functions.
      
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      cc3ca220
  6. 20 2月, 2009 7 次提交
    • I
      x86: use the right protections for split-up pagetables · 07a66d7c
      Ingo Molnar 提交于
      Steven Rostedt found a bug in where in his modified kernel
      ftrace was unable to modify the kernel text, due to the PMD
      itself having been marked read-only as well in
      split_large_page().
      
      The fix, suggested by Linus, is to not try to 'clone' the
      reference protection of a huge-page, but to use the standard
      (and permissive) page protection bits of KERNPG_TABLE.
      
      The 'cloning' makes sense for the ptes but it's a confused and
      incorrect concept at the page table level - because the
      pagetable entry is a set of all ptes and hence cannot
      'clone' any single protection attribute - the ptes can be any
      mixture of protections.
      
      With the permissive KERNPG_TABLE, even if the pte protections
      get changed after this point (due to ftrace doing code-patching
      or other similar activities like kprobes), the resulting combined
      protections will still be correct and the pte's restrictive
      (or permissive) protections will control it.
      
      Also update the comment.
      
      This bug was there for a long time but has not caused visible
      problems before as it needs a rather large read-only area to
      trigger. Steve possibly hacked his kernel with some really
      large arrays or so. Anyway, the bug is definitely worth fixing.
      
      [ Huang Ying also experienced problems in this area when writing
        the EFI code, but the real bug in split_large_page() was not
        realized back then. ]
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Reported-by: NHuang Ying <ying.huang@intel.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      07a66d7c
    • A
      x86, vmi: TSC going backwards check in vmi clocksource · 48ffc70b
      Alok N Kataria 提交于
      Impact: fix time warps under vmware
      
      Similar to the check for TSC going backwards in the TSC clocksource,
      we also need this check for VMI clocksource.
      Signed-off-by: NAlok N Kataria <akataria@vmware.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: stable@kernel.org
      48ffc70b
    • H
      x86, mce: use %ll instead of %L for 64-bit numbers · f6d1826d
      H. Peter Anvin 提交于
      Impact: Cleanup
      
      The standard spelling of a printf pattern for long long is "ll", not
      "L", which is for long double.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      f6d1826d
    • A
      x86, mce: separate correct machine check poller and fatal exception handler · b79109c3
      Andi Kleen 提交于
      Impact: cleanup, performance enhancement
      
      The machine check poller is diverging more and more from the fatal
      exception handler. Instead of adding more special cases separate the code
      paths completely. The corrected poll path is actually quite simple,
      and this doesn't result in much code duplication.
      
      This makes both handlers much easier to read and results in
      cleaner code flow.  The exception handler now only needs to care
      about uncorrected errors, which also simplifies the handling of multiple
      errors. The corrected poller also now always runs in standard interrupt
      context and does not need to do anything special to handle NMI context.
      
      Minor behaviour changes:
      - MCG status is now not cleared on polling.
      - Only the banks which had corrected errors get cleared on polling
      - The exception handler only clears banks with errors now
      
      v2: Forward port to new patch order. Add "uc" argument.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      b79109c3
    • A
      x86, mce: factor out duplicated struct mce setup into one function · b5f2fa4e
      Andi Kleen 提交于
      Impact: cleanup
      
      This merely factors out duplicated code to set up
      the initial struct mce state into a single function.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      b5f2fa4e
    • A
      x86, mce: implement dynamic machine check banks support · 0d7482e3
      Andi Kleen 提交于
      Impact: cleanup; making code future proof; memory saving on small systems
      
      This patch replaces the hardcoded max number of machine check banks with 
      dynamic allocation depending on what the CPU reports. The sysfs
      data structures and the banks array are dynamically allocated.
      
      There is still a hard bank limit (128) because the mcelog protocol uses
      banks >= 128 as pseudo banks to escape other events. But we expect
      that 128 banks is beyond any reasonable CPU for now.
      
      This supersedes an earlier patch by Venki, but it solves the problem
      more completely by making the limit fully dynamic (up to the 128
      boundary).
      
      This saves some memory on machines with less than 6 banks because
      they won't need sysdevs for unused ones and also allows to 
      use sysfs to control these banks on possible future CPUs with
      more than 6 banks.
      
      This is an updated patch addressing Venki's comments.  I also added in
      another patch from Thomas which fixed the error allocation path (that
      patch was previously separated)
      
      Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      0d7482e3
    • A
      x86, mce: enable machine checks in 64-bit defconfig · e35849e9
      Andi Kleen 提交于
      Impact: Low priority fix
      
      The 32-bit defconfig already had it enabled. And it's a pretty
      fundamental feature, so better enable it on 64 bits too.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      e35849e9
  7. 19 2月, 2009 1 次提交
    • K
      mm: clean up for early_pfn_to_nid() · f2dbcfa7
      KAMEZAWA Hiroyuki 提交于
      What's happening is that the assertion in mm/page_alloc.c:move_freepages()
      is triggering:
      
      	BUG_ON(page_zone(start_page) != page_zone(end_page));
      
      Once I knew this is what was happening, I added some annotations:
      
      	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
      		printk(KERN_ERR "move_freepages: Bogus zones: "
      		       "start_page[%p] end_page[%p] zone[%p]\n",
      		       start_page, end_page, zone);
      		printk(KERN_ERR "move_freepages: "
      		       "start_zone[%p] end_zone[%p]\n",
      		       page_zone(start_page), page_zone(end_page));
      		printk(KERN_ERR "move_freepages: "
      		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
      		       page_to_pfn(start_page), page_to_pfn(end_page));
      		printk(KERN_ERR "move_freepages: "
      		       "start_nid[%d] end_nid[%d]\n",
      		       page_to_nid(start_page), page_to_nid(end_page));
       ...
      
      And here's what I got:
      
      	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
      	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
      	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
      	move_freepages: start_nid[1] end_nid[0]
      
      My memory layout on this box is:
      
      [    0.000000] Zone PFN ranges:
      [    0.000000]   Normal   0x00000000 -> 0x0081ff5d
      [    0.000000] Movable zone start PFN for each node
      [    0.000000] early_node_map[8] active PFN ranges
      [    0.000000]     0: 0x00000000 -> 0x00020000
      [    0.000000]     1: 0x00800000 -> 0x0081f7ff
      [    0.000000]     1: 0x0081f800 -> 0x0081fe50
      [    0.000000]     1: 0x0081fed1 -> 0x0081fed8
      [    0.000000]     1: 0x0081feda -> 0x0081fedb
      [    0.000000]     1: 0x0081fedd -> 0x0081fee5
      [    0.000000]     1: 0x0081fee7 -> 0x0081ff51
      [    0.000000]     1: 0x0081ff59 -> 0x0081ff5d
      
      So it's a block move in that 0x81f600-->0x81f7ff region which triggers
      the problem.
      
      This patch:
      
      Declaration of early_pfn_to_nid() is scattered over per-arch include
      files, and it seems it's complicated to know when the declaration is used.
       I think it makes fix-for-memmap-init not easy.
      
      This patch moves all declaration to include/linux/mm.h
      
      After this,
        if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
           -> Use static definition in include/linux/mm.h
        else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
           -> Use generic definition in mm/page_alloc.c
        else
           -> per-arch back end function will be called.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reported-by: NDavid Miller <davem@davemlloft.net>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f2dbcfa7
  8. 18 2月, 2009 11 次提交
    • H
      x86, mce: fix a race condition in mce_read() · ef41df43
      Huang Ying 提交于
      Impact: bugfix
      
      Considering the situation as follow:
      
      before: mcelog.next == 1, mcelog.entry[0].finished = 1
      
      +--------------------------------------------------------------------------
      R                   W1                  W2                  W3
      
      read mcelog.next (1)
                          mcelog.next++ (2)
                          (working on entry 1,
                          finished == 0)
      
      mcelog.next = 0
                                              mcelog.next++ (1)
                                              (working on entry 0)
                                                                 mcelog.next++ (2)
                                                                 (working on entry 1)
                              <----------------- race ---------------->
                          (done on entry 1,
                          finished = 1)
                                                                 (done on entry 1,
                                                                 finished = 1)
      
      To fix the race condition, a cmpxchg loop is added to mce_read() to
      ensure no new MCE record can be added between mcelog.next reading and
      mcelog.next = 0.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ef41df43
    • A
      x86, mce: disable machine checks on offlined CPUs · d6b75584
      Andi Kleen 提交于
      Impact: Lower priority bug fix
      
      Offlined CPUs could still get machine checks, but the machine check handler
      cannot handle them properly, leading to an unconditional crash. Disable
      machine checks on CPUs that are going down.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      d6b75584
    • A
      x86, mce: don't set up mce sysdev devices with mce=off · 5b4408fd
      Andi Kleen 提交于
      Impact: bug fix, in this case the resume handler shouldn't run which
      	avoids incorrectly reenabling machine checks on resume
      
      When MCEs are completely disabled on the command line don't set
      up the sysdev devices for them either.
      
      Includes a comment fix from Thomas Gleixner.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      5b4408fd
    • A
      x86, mce: switch machine check polling to per CPU timer · 52d168e2
      Andi Kleen 提交于
      Impact: Higher priority bug fix
      
      The machine check poller runs a single timer and then broadcasted an
      IPI to all CPUs to check them. This leads to unnecessary
      synchronization between CPUs. The original CPU running the timer has
      to wait potentially a long time for all other CPUs answering. This is
      also real time unfriendly and in general inefficient.
      
      This was especially a problem on systems with a lot of events where
      the poller run with a higher frequency after processing some events.
      There could be more and more CPU time wasted with this, to
      the point of significantly slowing down machines.
      
      The machine check polling is actually fully independent per CPU, so
      there's no reason to not just do this all with per CPU timers.  This
      patch implements that.
      
      Also switch the poller also to use standard timers instead of work
      queues. It was using work queues to be able to execute a user program
      on a event, but mce_notify_user() handles this case now with a
      separate callback. So instead always run the poll code in in a
      standard per CPU timer, which means that in the common case of not
      having to execute a trigger there will be less overhead.
      
      This allows to clean up the initialization significantly, because
      standard timers are already up when machine checks get init'ed.  No
      multiple initialization functions.
      
      Thanks to Thomas Gleixner for some help.
      
      Cc: thockin@google.com
      v2: Use del_timer_sync() on cpu shutdown and don't try to handle
      migrated timers.
      v3: Add WARN_ON for timer running on unexpected CPU
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      52d168e2
    • A
      x86, mce: always use separate work queue to run trigger · 9bd98405
      Andi Kleen 提交于
      Impact: Needed for bug fix in next patch
      
      This relaxes the requirement that mce_notify_user has to run in process
      context. Useful for future changes, but also leads to cleaner
      behaviour now. Now instead mce_notify_user can be called directly
      from interrupt (but not NMI) context.
      
      The work queue only uses a single global work struct, which can be done safely
      because it is always free to reuse before the trigger function is executed.
      This way no events can be lost.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      9bd98405
    • A
      x86, mce: don't disable machine checks during code patching · 123aa76e
      Andi Kleen 提交于
      Impact: low priority bug fix
      
      This removes part of a a patch I added myself some time ago. After some
      consideration the patch was a bad idea. In particular it stopped machine check
      exceptions during code patching.
      
      To quote the comment:
      
              * MCEs only happen when something got corrupted and in this
              * case we must do something about the corruption.
              * Ignoring it is worse than a unlikely patching race.
              * Also machine checks tend to be broadcast and if one CPU
              * goes into machine check the others follow quickly, so we don't
              * expect a machine check to cause undue problems during to code
              * patching.
      
      So undo the machine check related parts of
      8f4e956b NMIs are still disabled.
      
      This only removes code, the only additions are a new comment.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      123aa76e
    • A
      x86, mce: disable machine checks on suspend · 973a2dd1
      Andi Kleen 提交于
      Impact: Bug fix
      
      During suspend it is not reliable to process machine check
      exceptions, because CPUs disappear but can still get machine check
      broadcasts.  Also the system is slightly more likely to
      machine check them, but the handler is typically not a position
      to handle them in a meaningfull way.
      
      So disable them during suspend and enable them during resume.
      
      Also make sure they are always disabled on hot-unplugged CPUs.
      
      This new code assumes that suspend always hotunplugs all
      non BP CPUs.
      
      v2: Remove the WARN_ONs Thomas objected to.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      973a2dd1
    • A
      x86, mce: fix ifdef for 64bit thermal apic vector clear on shutdown · 07db1c14
      Andi Kleen 提交于
      Impact: Bugfix
      
      The ifdef for the apic clear on shutdown for the 64bit intel thermal
      vector was incorrect and never triggered. Fix that.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      07db1c14
    • A
      x86, mce: use force_sig_info to kill process in machine check · 380851bc
      Andi Kleen 提交于
      Impact: bug fix (with tolerant == 3)
      
      do_exit cannot be called directly from the exception handler because
      it can sleep and the exception handler runs on the exception stack.
      Use force_sig() instead.
      
      Based on a earlier patch by Ying Huang who debugged the problem.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      380851bc
    • A
      x86, mce: reinitialize per cpu features on resume · 6ec68bff
      Andi Kleen 提交于
      Impact: Bug fix
      
      This fixes a long standing bug in the machine check code. On resume the
      boot CPU wouldn't get its vendor specific state like thermal handling
      reinitialized. This means the boot cpu wouldn't ever get any thermal
      events reported again.
      
      Call the respective initialization functions on resume
      
      v2: Remove ancient init because they don't have a resume device anyways.
          Pointed out by Thomas Gleixner.
      v3: Now fix the Subject too to reflect v2 change
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      6ec68bff
    • P
      x86, rcu: fix strange load average and ksoftirqd behavior · bf51935f
      Paul E. McKenney 提交于
      Damien Wyart reported high ksoftirqd CPU usage (20%) on an
      otherwise idle system.
      
      The function-graph trace Damien provided:
      
      >   799.521187 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.521371 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.521555 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.521738 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.521934 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522068 |   1)  ksoftir-2324  |               |                rcu_check_callbacks() {
      >   799.522208 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522392 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522575 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522759 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522956 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523074 |   1)  ksoftir-2324  |               |                  rcu_check_callbacks() {
      >   799.523214 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523397 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523579 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523762 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523960 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.524079 |   1)  ksoftir-2324  |               |                  rcu_check_callbacks() {
      >   799.524220 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.524403 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.524587 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.524770 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      > [ . . . ]
      
      Shows rcu_check_callbacks() being invoked way too often. It should be called
      once per jiffy, and here it is called no less than 22 times in about
      3.5 milliseconds, meaning one call every 160 microseconds or so.
      
      Why do we need to call rcu_pending() and rcu_check_callbacks() from the
      idle loop of 32-bit x86, especially given that no other architecture does
      this?
      
      The following patch removes the call to rcu_pending() and
      rcu_check_callbacks() from the x86 32-bit idle loop in order to
      reduce the softirq load on idle systems.
      Reported-by: NDamien Wyart <damien.wyart@free.fr>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bf51935f
  9. 16 2月, 2009 2 次提交
    • R
      cpumask: fix powernow-k8: partial revert of 2fdf66b4 · a0abd520
      Rusty Russell 提交于
      Impact: fix powernow-k8 when acpi=off (or other error).
      
      There was a spurious change introduced into powernow-k8 in this patch:
      so that we try to "restore" the cpus_allowed we never saved.  We revert
      that file.
      
      See lkml "[PATCH] x86/powernow: fix cpus_allowed brokage when
      acpi=off" from Yinghai for the bug report.
      
      Cc: Mike Travis <travis@sgi.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      a0abd520
    • P
      trace: mmiotrace to the tracer menu in Kconfig · 6bc5c366
      Pekka Paalanen 提交于
      Impact: cosmetic change in Kconfig menu layout
      
      This patch was originally suggested by Peter Zijlstra, but seems it
      was forgotten.
      
      CONFIG_MMIOTRACE and CONFIG_MMIOTRACE_TEST were selectable
      directly under the Kernel hacking / debugging menu in the kernel
      configuration system. They were present only for x86 and x86_64.
      
      Other tracers that use the ftrace tracing framework are in their own
      sub-menu. This patch moves the mmiotrace configuration options there.
      Since the Kconfig file, where the tracer menu is, is not architecture
      specific, HAVE_MMIOTRACE_SUPPORT is introduced and provided only by
      x86/x86_64. CONFIG_MMIOTRACE now depends on it.
      Signed-off-by: NPekka Paalanen <pq@iki.fi>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6bc5c366
  10. 15 2月, 2009 5 次提交