1. 29 3月, 2014 1 次提交
    • C
      x86, CMCI: Add proper detection of end of CMCI storms · 27f6c573
      Chen, Gong 提交于
      When CMCI storm persists for a long time(at least beyond predefined
      threshold. It's 30 seconds for now), we can watch CMCI storm is
      detected immediately after it subsides.
      
      ...
      Dec 10 22:04:29 kernel: CMCI storm detected: switching to poll mode
      Dec 10 22:04:59 kernel: CMCI storm subsided: switching to interrupt mode
      Dec 10 22:04:59 kernel: CMCI storm detected: switching to poll mode
      Dec 10 22:05:29 kernel: CMCI storm subsided: switching to interrupt mode
      ...
      
      The problem is that our logic that determines that the storm has
      ended is incorrect. We announce the end, re-enable interrupts and
      realize that the storm is still going on, so we switch back to
      polling mode. Rinse, repeat.
      
      When a storm happens we disable signaling of errors via CMCI and begin
      polling machine check banks instead. If we find any logged errors,
      then we need to set a per-cpu flag so that our per-cpu tests that
      check whether the storm is ongoing will see that errors are still
      being logged independently of whether mce_notify_irq() says that the
      error has been fully processed.
      
      cmci_clear() is not the right tool to disable a bank. It disables the
      interrupt for the bank as desired, but it also clears the bit for
      this bank in "mce_banks_owned" so we will skip the bank when polling
      (so we fail to see that the storm continues because we stop looking).
      New cmci_storm_disable_banks() just disables the interrupt while
      allowing polling to continue.
      Reported-by: NWilliam Dauchy <wdauchy@gmail.com>
      Signed-off-by: NChen, Gong <gong.chen@linux.intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      27f6c573
  2. 19 3月, 2014 1 次提交
    • B
      Revert "[PATCH] Insert GART region into resource map" · 707d4eef
      Bjorn Helgaas 提交于
      This reverts commit 56dd669a, which makes the GART visible in
      /proc/iomem.  This fixes a regression: e501b3d8 ("agp: Support 64-bit
      APBASE") exposed an existing problem with a conflict between the GART
      region and a PCI BAR region.
      
      The GART addresses are bus addresses, not CPU addresses, and therefore
      should not be inserted in iomem_resource.
      
      On many machines, the GART region is addressable by the CPU as well as by
      an AGP master, but CPU addressability is not required by the spec.  On some
      of these machines, the GART is mapped by a PCI BAR, and in that case, the
      PCI core automatically inserts it into iomem_resource, just as it does for
      all BARs.
      
      Inserting it here means we'll have a conflict if the PCI core later tries
      to claim the GART region, so let's drop the insertion here.
      
      The conflict indirectly causes X failures, as reported by Jouni in the
      bugzilla below.  We detected the conflict even before e501b3d8, but
      after it the AGP code (fix_northbridge()) uses the PCI resource (which is
      zeroed because of the conflict) instead of reading the BAR again.
      
      Conflicts:
      	arch/x86_64/kernel/aperture.c
      
      Fixes: e501b3d8 agp: Support 64-bit APBASE
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=72201Reported-and-tested-by: NJouni Mettälä <jtmettala@gmail.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      707d4eef
  3. 14 3月, 2014 1 次提交
  4. 12 3月, 2014 2 次提交
    • S
      x86, fpu: Check tsk_used_math() in kernel_fpu_end() for eager FPU · 731bd6a9
      Suresh Siddha 提交于
      For non-eager fpu mode, thread's fpu state is allocated during the first
      fpu usage (in the context of device not available exception). This
      (math_state_restore()) can be a blocking call and hence we enable
      interrupts (which were originally disabled when the exception happened),
      allocate memory and disable interrupts etc.
      
      But the eager-fpu mode, call's the same math_state_restore() from
      kernel_fpu_end(). The assumption being that tsk_used_math() is always
      set for the eager-fpu mode and thus avoid the code path of enabling
      interrupts, allocating fpu state using blocking call and disable
      interrupts etc.
      
      But the below issue was noticed by Maarten Baert, Nate Eldredge and
      few others:
      
      If a user process dumps core on an ecrypt fs while aesni-intel is loaded,
      we get a BUG() in __find_get_block() complaining that it was called with
      interrupts disabled; then all further accesses to our ecrypt fs hang
      and we have to reboot.
      
      The aesni-intel code (encrypting the core file that we are writing) needs
      the FPU and quite properly wraps its code in kernel_fpu_{begin,end}(),
      the latter of which calls math_state_restore(). So after kernel_fpu_end(),
      interrupts may be disabled, which nobody seems to expect, and they stay
      that way until we eventually get to __find_get_block() which barfs.
      
      For eager fpu, most the time, tsk_used_math() is true. At few instances
      during thread exit, signal return handling etc, tsk_used_math() might
      be false.
      
      In kernel_fpu_end(), for eager-fpu, call math_state_restore()
      only if tsk_used_math() is set. Otherwise, don't bother. Kernel code
      path which cleared tsk_used_math() knows what needs to be done
      with the fpu state.
      Reported-by: NMaarten Baert <maarten-baert@hotmail.com>
      Reported-by: NNate Eldredge <nate@thatsmathematics.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSuresh Siddha <sbsiddha@gmail.com>
      Link: http://lkml.kernel.org/r/1391410583.3801.6.camel@europa
      Cc: George Spelvin <linux@horizon.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      731bd6a9
    • D
      x86: Remove CONFIG_X86_OOSTORE · 09df7c4c
      Dave Jones 提交于
      This was an optimization that made memcpy type benchmarks a little
      faster on ancient (Circa 1998) IDT Winchip CPUs.  In real-life
      workloads, it wasn't even noticable, and I doubt anyone is running
      benchmarks on 16 year old silicon any more.
      
      Given this code has likely seen very little use over the last decade,
      let's just remove it.
      Signed-off-by: NDave Jones <davej@fedoraproject.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09df7c4c
  5. 11 3月, 2014 1 次提交
  6. 08 3月, 2014 2 次提交
    • L
      x86: fix compile error due to X86_TRAP_NMI use in asm files · b01d4e68
      Linus Torvalds 提交于
      It's an enum, not a #define, you can't use it in asm files.
      
      Introduced in commit 5fa10196 ("x86: Ignore NMIs that come in during
      early boot"), and sadly I didn't compile-test things like I should have
      before pushing out.
      
      My weak excuse is that the x86 tree generally doesn't introduce stupid
      things like this (and the ARM pull afterwards doesn't cause me to do a
      compile-test either, since I don't cross-compile).
      
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b01d4e68
    • H
      x86: Ignore NMIs that come in during early boot · 5fa10196
      H. Peter Anvin 提交于
      Don Zickus reports:
      
      A customer generated an external NMI using their iLO to test kdump
      worked.  Unfortunately, the machine hung.  Disabling the nmi_watchdog
      made things work.
      
      I speculated the external NMI fired, caused the machine to panic (as
      expected) and the perf NMI from the watchdog came in and was latched.
      My guess was this somehow caused the hang.
      
         ----
      
      It appears that the latched NMI stays latched until the early page
      table generation on 64 bits, which causes exceptions to happen which
      end in IRET, which re-enable NMI.  Therefore, ignore NMIs that come in
      during early execution, until we have proper exception handling.
      Reported-and-tested-by: NDon Zickus <dzickus@redhat.com>
      Link: http://lkml.kernel.org/r/1394221143-29713-1-git-send-email-dzickus@redhat.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: <stable@vger.kernel.org> # v3.5+, older with some backport effort
      5fa10196
  7. 05 3月, 2014 1 次提交
    • B
      x86/efi: Quirk out SGI UV · a5d90c92
      Borislav Petkov 提交于
      Alex reported hitting the following BUG after the EFI 1:1 virtual
      mapping work was merged,
      
       kernel BUG at arch/x86/mm/init_64.c:351!
       invalid opcode: 0000 [#1] SMP
       Call Trace:
        [<ffffffff818aa71d>] init_extra_mapping_uc+0x13/0x15
        [<ffffffff818a5e20>] uv_system_init+0x22b/0x124b
        [<ffffffff8108b886>] ? clockevents_register_device+0x138/0x13d
        [<ffffffff81028dbb>] ? setup_APIC_timer+0xc5/0xc7
        [<ffffffff8108b620>] ? clockevent_delta2ns+0xb/0xd
        [<ffffffff818a3a92>] ? setup_boot_APIC_clock+0x4a8/0x4b7
        [<ffffffff8153d955>] ? printk+0x72/0x74
        [<ffffffff818a1757>] native_smp_prepare_cpus+0x389/0x3d6
        [<ffffffff818957bc>] kernel_init_freeable+0xb7/0x1fb
        [<ffffffff81535530>] ? rest_init+0x74/0x74
        [<ffffffff81535539>] kernel_init+0x9/0xff
        [<ffffffff81541dfc>] ret_from_fork+0x7c/0xb0
        [<ffffffff81535530>] ? rest_init+0x74/0x74
      
      Getting this thing to work with the new mapping scheme would need more
      work, so automatically switch to the old memmap layout for SGI UV.
      Acked-by: NRuss Anderson <rja@sgi.com>
      Cc: Alex Thorlton <athorlton@sgi.com
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      a5d90c92
  8. 27 2月, 2014 1 次提交
    • P
      perf/x86: Fix event scheduling · 26e61e89
      Peter Zijlstra 提交于
      Vince "Super Tester" Weaver reported a new round of syscall fuzzing (Trinity) failures,
      with perf WARN_ON()s triggering. He also provided traces of the failures.
      
      This is I think the relevant bit:
      
      	>    pec_1076_warn-2804  [000] d...   147.926153: x86_pmu_disable: x86_pmu_disable
      	>    pec_1076_warn-2804  [000] d...   147.926153: x86_pmu_state: Events: {
      	>    pec_1076_warn-2804  [000] d...   147.926156: x86_pmu_state:   0: state: .R config: ffffffffffffffff (          (null))
      	>    pec_1076_warn-2804  [000] d...   147.926158: x86_pmu_state:   33: state: AR config: 0 (ffff88011ac99800)
      	>    pec_1076_warn-2804  [000] d...   147.926159: x86_pmu_state: }
      	>    pec_1076_warn-2804  [000] d...   147.926160: x86_pmu_state: n_events: 1, n_added: 0, n_txn: 1
      	>    pec_1076_warn-2804  [000] d...   147.926161: x86_pmu_state: Assignment: {
      	>    pec_1076_warn-2804  [000] d...   147.926162: x86_pmu_state:   0->33 tag: 1 config: 0 (ffff88011ac99800)
      	>    pec_1076_warn-2804  [000] d...   147.926163: x86_pmu_state: }
      	>    pec_1076_warn-2804  [000] d...   147.926166: collect_events: Adding event: 1 (ffff880119ec8800)
      
      So we add the insn:p event (fd[23]).
      
      At this point we should have:
      
        n_events = 2, n_added = 1, n_txn = 1
      
      	>    pec_1076_warn-2804  [000] d...   147.926170: collect_events: Adding event: 0 (ffff8800c9e01800)
      	>    pec_1076_warn-2804  [000] d...   147.926172: collect_events: Adding event: 4 (ffff8800cbab2c00)
      
      We try and add the {BP,cycles,br_insn} group (fd[3], fd[4], fd[15]).
      These events are 0:cycles and 4:br_insn, the BP event isn't x86_pmu so
      that's not visible.
      
      	group_sched_in()
      	  pmu->start_txn() /* nop - BP pmu */
      	  event_sched_in()
      	     event->pmu->add()
      
      So here we should end up with:
      
        0: n_events = 3, n_added = 2, n_txn = 2
        4: n_events = 4, n_added = 3, n_txn = 3
      
      But seeing the below state on x86_pmu_enable(), the must have failed,
      because the 0 and 4 events aren't there anymore.
      
      Looking at group_sched_in(), since the BP is the leader, its
      event_sched_in() must have succeeded, for otherwise we would not have
      seen the sibling adds.
      
      But since neither 0 or 4 are in the below state; their event_sched_in()
      must have failed; but I don't see why, the complete state: 0,0,1:p,4
      fits perfectly fine on a core2.
      
      However, since we try and schedule 4 it means the 0 event must have
      succeeded!  Therefore the 4 event must have failed, its failure will
      have put group_sched_in() into the fail path, which will call:
      
      	event_sched_out()
      	  event->pmu->del()
      
      on 0 and the BP event.
      
      Now x86_pmu_del() will reduce n_events; but it will not reduce n_added;
      giving what we see below:
      
       n_event = 2, n_added = 2, n_txn = 2
      
      	>    pec_1076_warn-2804  [000] d...   147.926177: x86_pmu_enable: x86_pmu_enable
      	>    pec_1076_warn-2804  [000] d...   147.926177: x86_pmu_state: Events: {
      	>    pec_1076_warn-2804  [000] d...   147.926179: x86_pmu_state:   0: state: .R config: ffffffffffffffff (          (null))
      	>    pec_1076_warn-2804  [000] d...   147.926181: x86_pmu_state:   33: state: AR config: 0 (ffff88011ac99800)
      	>    pec_1076_warn-2804  [000] d...   147.926182: x86_pmu_state: }
      	>    pec_1076_warn-2804  [000] d...   147.926184: x86_pmu_state: n_events: 2, n_added: 2, n_txn: 2
      	>    pec_1076_warn-2804  [000] d...   147.926184: x86_pmu_state: Assignment: {
      	>    pec_1076_warn-2804  [000] d...   147.926186: x86_pmu_state:   0->33 tag: 1 config: 0 (ffff88011ac99800)
      	>    pec_1076_warn-2804  [000] d...   147.926188: x86_pmu_state:   1->0 tag: 1 config: 1 (ffff880119ec8800)
      	>    pec_1076_warn-2804  [000] d...   147.926188: x86_pmu_state: }
      	>    pec_1076_warn-2804  [000] d...   147.926190: x86_pmu_enable: S0: hwc->idx: 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0
      
      So the problem is that x86_pmu_del(), when called from a
      group_sched_in() that fails (for whatever reason), and without x86_pmu
      TXN support (because the leader is !x86_pmu), will corrupt the n_added
      state.
      Reported-and-Tested-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20140221150312.GF3104@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      26e61e89
  9. 26 2月, 2014 1 次提交
  10. 22 2月, 2014 3 次提交
  11. 20 2月, 2014 2 次提交
    • M
      x86: tsc: Add missing Baytrail frequency to the table · 3e11e818
      Mika Westerberg 提交于
      Intel Baytrail is based on Silvermont core so MSR_FSB_FREQ[2:0] == 0 means
      that the CPU reference clock runs at 83.3MHz. Add this missing frequency to
      the table.
      Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Bin Gao <bin.gao@linux.intel.com>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/1392810750-18660-2-git-send-email-mika.westerberg@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      3e11e818
    • T
      x86, tsc: Fallback to normal calibration if fast MSR calibration fails · 5f0e0309
      Thomas Gleixner 提交于
      If we cannot calibrate TSC via MSR based calibration
      try_msr_calibrate_tsc() stores zero to fast_calibrate and returns that
      to the caller. This value gets then propagated further to clockevents
      code resulting division by zero oops like the one below:
      
       divide error: 0000 [#1] PREEMPT SMP
       Modules linked in:
       CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W    3.13.0+ #47
       task: ffff880075508000 ti: ffff880075506000 task.ti: ffff880075506000
       RIP: 0010:[<ffffffff810aec14>]  [<ffffffff810aec14>] clockevents_config.part.3+0x24/0xa0
       RSP: 0000:ffff880075507e58  EFLAGS: 00010246
       RAX: ffffffffffffffff RBX: ffff880079c0cd80 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffffffffff
       RBP: ffff880075507e70 R08: 0000000000000001 R09: 00000000000000be
       R10: 00000000000000bd R11: 0000000000000003 R12: 000000000000b008
       R13: 0000000000000008 R14: 000000000000b010 R15: 0000000000000000
       FS:  0000000000000000(0000) GS:ffff880079c00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: ffff880079fff000 CR3: 0000000001c0b000 CR4: 00000000001006f0
       Stack:
        ffff880079c0cd80 000000000000b008 0000000000000008 ffff880075507e88
        ffffffff810aecb0 ffff880079c0cd80 ffff880075507e98 ffffffff81030168
        ffff880075507ed8 ffffffff81d1104f 00000000000000c3 0000000000000000
       Call Trace:
        [<ffffffff810aecb0>] clockevents_config_and_register+0x20/0x30
        [<ffffffff81030168>] setup_APIC_timer+0xc8/0xd0
        [<ffffffff81d1104f>] setup_boot_APIC_clock+0x4cc/0x4d8
        [<ffffffff81d0f5de>] native_smp_prepare_cpus+0x3dd/0x3f0
        [<ffffffff81d02ee9>] kernel_init_freeable+0xc3/0x205
        [<ffffffff8177c910>] ? rest_init+0x90/0x90
        [<ffffffff8177c91e>] kernel_init+0xe/0x120
        [<ffffffff8178deec>] ret_from_fork+0x7c/0xb0
        [<ffffffff8177c910>] ? rest_init+0x90/0x90
      
      Prevent this from happening by:
       1) Modifying try_msr_calibrate_tsc() to return calibration value or zero
          if it fails.
       2) Check this return value in native_calibrate_tsc() and in case of zero
          fallback to use normal non-MSR based calibration.
      
      [mw: Added subject and changelog]
      Reported-and-tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Bin Gao <bin.gao@linux.intel.com>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/1392810750-18660-1-git-send-email-mika.westerberg@linux.intel.comSigned-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5f0e0309
  12. 13 2月, 2014 1 次提交
  13. 12 2月, 2014 1 次提交
    • S
      ftrace/x86: Use breakpoints for converting function graph caller · 87fbb2ac
      Steven Rostedt (Red Hat) 提交于
      When the conversion was made to remove stop machine and use the breakpoint
      logic instead, the modification of the function graph caller is still
      done directly as though it was being done under stop machine.
      
      As it is not converted via stop machine anymore, there is a possibility
      that the code could be layed across cache lines and if another CPU is
      accessing that function graph call when it is being updated, it could
      cause a General Protection Fault.
      
      Convert the update of the function graph caller to use the breakpoint
      method as well.
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: stable@vger.kernel.org # 3.5+
      Fixes: 08d636b6 "ftrace/x86: Have arch x86_64 use breakpoints instead of stop machine"
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      87fbb2ac
  14. 11 2月, 2014 1 次提交
  15. 09 2月, 2014 3 次提交
  16. 07 2月, 2014 1 次提交
  17. 31 1月, 2014 1 次提交
  18. 30 1月, 2014 4 次提交
  19. 28 1月, 2014 1 次提交
  20. 25 1月, 2014 6 次提交
    • M
      mm, x86: Revisit tlb_flushall_shift tuning for page flushes except on IvyBridge · b9a3b4c9
      Mel Gorman 提交于
      There was a large ebizzy performance regression that was
      bisected to commit 611ae8e3 (x86/tlb: enable tlb flush range
      support for x86).  The problem was related to the
      tlb_flushall_shift tuning for IvyBridge which was altered.  The
      problem is that it is not clear if the tuning values for each
      CPU family is correct as the methodology used to tune the values
      is unclear.
      
      This patch uses a conservative tlb_flushall_shift value for all
      CPU families except IvyBridge so the decision can be revisited
      if any regression is found as a result of this change.
      IvyBridge is an exception as testing with one methodology
      determined that the value of 2 is acceptable.  Details are in
      the changelog for the patch "x86: mm: Change tlb_flushall_shift
      for IvyBridge".
      
      One important aspect of this to watch out for is Xen.  The
      original commit log mentioned large performance gains on Xen.
      It's possible Xen is more sensitive to this value if it flushes
      small ranges of pages more frequently than workloads on bare
      metal typically do.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Tested-by: NDavidlohr Bueso <davidlohr@hp.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Alex Shi <alex.shi@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-dyzMww3fqugnhbhgo6Gxmtkw@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b9a3b4c9
    • M
      x86: mm: change tlb_flushall_shift for IvyBridge · f98b7a77
      Mel Gorman 提交于
      There was a large performance regression that was bisected to
      commit 611ae8e3 ("x86/tlb: enable tlb flush range support for
      x86").  This patch simply changes the default balance point
      between a local and global flush for IvyBridge.
      
      In the interest of allowing the tests to be reproduced, this
      patch was tested using mmtests 0.15 with the following
      configurations
      
      	configs/config-global-dhp__tlbflush-performance
      	configs/config-global-dhp__scheduler-performance
      	configs/config-global-dhp__network-performance
      
      Results are from two machines
      
      Ivybridge   4 threads:  Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz
      Ivybridge   8 threads:  Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
      
      Page fault microbenchmark showed nothing interesting.
      
      Ebizzy was configured to run multiple iterations and threads.
      Thread counts ranged from 1 to NR_CPUS*2. For each thread count,
      it ran 100 iterations and each iteration lasted 10 seconds.
      
      Ivybridge 4 threads
                          3.13.0-rc7            3.13.0-rc7
                             vanilla           altshift-v3
      Mean   1     6395.44 (  0.00%)     6789.09 (  6.16%)
      Mean   2     7012.85 (  0.00%)     8052.16 ( 14.82%)
      Mean   3     6403.04 (  0.00%)     6973.74 (  8.91%)
      Mean   4     6135.32 (  0.00%)     6582.33 (  7.29%)
      Mean   5     6095.69 (  0.00%)     6526.68 (  7.07%)
      Mean   6     6114.33 (  0.00%)     6416.64 (  4.94%)
      Mean   7     6085.10 (  0.00%)     6448.51 (  5.97%)
      Mean   8     6120.62 (  0.00%)     6462.97 (  5.59%)
      
      Ivybridge 8 threads
                           3.13.0-rc7            3.13.0-rc7
                              vanilla           altshift-v3
      Mean   1      7336.65 (  0.00%)     7787.02 (  6.14%)
      Mean   2      8218.41 (  0.00%)     9484.13 ( 15.40%)
      Mean   3      7973.62 (  0.00%)     8922.01 ( 11.89%)
      Mean   4      7798.33 (  0.00%)     8567.03 (  9.86%)
      Mean   5      7158.72 (  0.00%)     8214.23 ( 14.74%)
      Mean   6      6852.27 (  0.00%)     7952.45 ( 16.06%)
      Mean   7      6774.65 (  0.00%)     7536.35 ( 11.24%)
      Mean   8      6510.50 (  0.00%)     6894.05 (  5.89%)
      Mean   12     6182.90 (  0.00%)     6661.29 (  7.74%)
      Mean   16     6100.09 (  0.00%)     6608.69 (  8.34%)
      
      Ebizzy hits the worst case scenario for TLB range flushing every
      time and it shows for these Ivybridge CPUs at least that the
      default choice is a poor on.  The patch addresses the problem.
      
      Next was a tlbflush microbenchmark written by Alex Shi at
      http://marc.info/?l=linux-kernel&m=133727348217113 .  It
      measures access costs while the TLB is being flushed.  The
      expectation is that if there are always full TLB flushes that
      the benchmark would suffer and it benefits from range flushing
      
      There are 320 iterations of the test per thread count.  The
      number of entries is randomly selected with a min of 1 and max
      of 512.  To ensure a reasonably even spread of entries, the full
      range is broken up into 8 sections and a random number selected
      within that section.
      
      iteration 1, random number between 0-64
      iteration 2, random number between 64-128 etc
      
      This is still a very weak methodology.  When you do not know
      what are typical ranges, random is a reasonable choice but it
      can be easily argued that the opimisation was for smaller ranges
      and an even spread is not representative of any workload that
      matters.  To improve this, we'd need to know the probability
      distribution of TLB flush range sizes for a set of workloads
      that are considered "common", build a synthetic trace and feed
      that into this benchmark.  Even that is not perfect because it
      would not account for the time between flushes but there are
      limits of what can be reasonably done and still be doing
      something useful.  If a representative synthetic trace is
      provided then this benchmark could be revisited and the shift values retuned.
      
      Ivybridge 4 threads
                              3.13.0-rc7            3.13.0-rc7
                                 vanilla           altshift-v3
      Mean       1       10.50 (  0.00%)       10.50 (  0.03%)
      Mean       2       17.59 (  0.00%)       17.18 (  2.34%)
      Mean       3       22.98 (  0.00%)       21.74 (  5.41%)
      Mean       5       47.13 (  0.00%)       46.23 (  1.92%)
      Mean       8       43.30 (  0.00%)       42.56 (  1.72%)
      
      Ivybridge 8 threads
                               3.13.0-rc7            3.13.0-rc7
                                  vanilla           altshift-v3
      Mean       1         9.45 (  0.00%)        9.36 (  0.93%)
      Mean       2         9.37 (  0.00%)        9.70 ( -3.54%)
      Mean       3         9.36 (  0.00%)        9.29 (  0.70%)
      Mean       5        14.49 (  0.00%)       15.04 ( -3.75%)
      Mean       8        41.08 (  0.00%)       38.73 (  5.71%)
      Mean       13       32.04 (  0.00%)       31.24 (  2.49%)
      Mean       16       40.05 (  0.00%)       39.04 (  2.51%)
      
      For both CPUs, average access time is reduced which is good as
      this is the benchmark that was used to tune the shift values in
      the first place albeit it is now known *how* the benchmark was
      used.
      
      The scheduler benchmarks were somewhat inconclusive.  They
      showed gains and losses and makes me reconsider how stable those
      benchmarks really are or if something else might be interfering
      with the test results recently.
      
      Network benchmarks were inconclusive.  Almost all results were
      flat except for netperf-udp tests on the 4 thread machine.
      These results were unstable and showed large variations between
      reboots.  It is unknown if this is a recent problems but I've
      noticed before that netperf-udp results tend to vary.
      
      Based on these results, changing the default for Ivybridge seems
      like a logical choice.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Tested-by: NDavidlohr Bueso <davidlohr@hp.com>
      Reviewed-by: NAlex Shi <alex.shi@linaro.org>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-cqnadffh1tiqrshthRj3Esge@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f98b7a77
    • M
      mm, x86: Account for TLB flushes only when debugging · ec659934
      Mel Gorman 提交于
      Bisection between 3.11 and 3.12 fingered commit 9824cf97 ("mm:
      vmstats: tlb flush counters") to cause overhead problems.
      
      The counters are undeniably useful but how often do we really
      need to debug TLB flush related issues?  It does not justify
      taking the penalty everywhere so make it a debugging option.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Tested-by: NDavidlohr Bueso <davidlohr@hp.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Alex Shi <alex.shi@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-XzxjntugxuwpxXhcrxqqh53b@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ec659934
    • M
      x86/uv/nmi: Fix Sparse warnings · 74c93f9d
      Mike Travis 提交于
      Make uv_register_nmi_notifier() and uv_handle_nmi_ping() static
      to address sparse warnings.
      
      Fix problem where uv_nmi_kexec_failed is unused when
      CONFIG_KEXEC is not defined.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Reviewed-by: NHedi Berriche <hedi@sgi.com>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Link: http://lkml.kernel.org/r/20140114162551.480872353@asylum.americas.sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      74c93f9d
    • D
      x86/AMD/NB: Fix amd_set_subcaches() parameter type · 2993ae33
      Dan Carpenter 提交于
      This is under CAP_SYS_ADMIN, but Smatch complains that mask comes
      from the user and the test for "mask > 0xf" can underflow.
      
      The fix is simple: amd_set_subcaches() should hand down not an 'int'
      but an 'unsigned long' like it was originally indended to do.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale-asia.com>
      Link: http://lkml.kernel.org/r/20140121072209.GA22095@elgon.mountainSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2993ae33
    • A
      x86/quirks: Add workaround for AMD F16h Erratum792 · fb53a1ab
      Aravind Gopalakrishnan 提交于
      The workaround for this Erratum is included in AGESA. But BIOSes
      spun only after Jan2014 will have the fix (atleast server
      versions of the chip). The erratum affects both embedded and
      server platforms and since we cannot say with certainity that
      ALL BIOSes on systems out in the field will have the fix, we
      should probably insulate ourselves in case BIOS does not do the
      right thing or someone is using old BIOSes.
      
      Refer to Revision Guide for AMD F16h models 00h-0fh, document 51810
      Rev. 3.04, November2013 for details on the Erratum.
      
      Tested the patch on Fam16h server platform and it works fine.
      Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
      Cc: <hmh@hmh.eng.br>
      Cc: <Kim.Naru@amd.com>
      Cc: <Suravee.Suthikulpanit@amd.com>
      Cc: <bp@suse.de>
      Cc: <sherry.hurwitz@amd.com>
      Link: http://lkml.kernel.org/r/1390515212-1824-1-git-send-email-Aravind.Gopalakrishnan@amd.com
      [ Minor edits. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      fb53a1ab
  21. 23 1月, 2014 1 次提交
  22. 22 1月, 2014 2 次提交
  23. 17 1月, 2014 1 次提交
  24. 16 1月, 2014 1 次提交
    • R
      perf/x86/amd/ibs: Fix waking up from S3 for AMD family 10h · bee09ed9
      Robert Richter 提交于
      On AMD family 10h we see following error messages while waking up from
      S3 for all non-boot CPUs leading to a failed IBS initialization:
      
       Enabling non-boot CPUs ...
       smpboot: Booting Node 0 Processor 1 APIC 0x1
       [Firmware Bug]: cpu 1, try to use APIC500 (LVT offset 0) for vector 0x400, but the register is already in use for vector 0xf9 on another cpu
       perf: IBS APIC setup failed on cpu #1
       process: Switch to broadcast mode on CPU1
       CPU1 is up
       ...
       ACPI: Waking up from system sleep state S3
      
      Reason for this is that during suspend the LVT offset for the IBS
      vector gets lost and needs to be reinialized while resuming.
      
      The offset is read from the IBSCTL msr. On family 10h the offset needs
      to be 1 as offset 0 is used for the MCE threshold interrupt, but
      firmware assings it for IBS to 0 too. The kernel needs to reprogram
      the vector. The msr is a readonly node msr, but a new value can be
      written via pci config space access. The reinitialization is
      implemented for family 10h in setup_ibs_ctl() which is forced during
      IBS setup.
      
      This patch fixes IBS setup after waking up from S3 by adding
      resume/supend hooks for the boot cpu which does the offset
      reinitialization.
      
      Marking it as stable to let distros pick up this fix.
      Signed-off-by: NRobert Richter <rric@kernel.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> v3.2..
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1389797849-5565-1-git-send-email-rric.net@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bee09ed9