1. 07 3月, 2014 3 次提交
  2. 04 3月, 2014 2 次提交
    • P
      ftrace/x86: One more missing sync after fixup of function modification failure · 12729f14
      Petr Mladek 提交于
      If a failure occurs while modifying ftrace function, it bails out and will
      remove the tracepoints to be back to what the code originally was.
      
      There is missing the final sync run across the CPUs after the fix up is done
      and before the ftrace int3 handler flag is reset.
      
      Here's the description of the problem:
      
      	CPU0				CPU1
      	----				----
        remove_breakpoint();
        modifying_ftrace_code = 0;
      
      				[still sees breakpoint]
      				<takes trap>
      				[sees modifying_ftrace_code as zero]
      				[no breakpoint handler]
      				[goto failed case]
      				[trap exception - kernel breakpoint, no
      				 handler]
      				BUG()
      
      Link: http://lkml.kernel.org/r/1393258342-29978-2-git-send-email-pmladek@suse.cz
      
      Fixes: 8a4d0a68 "ftrace: Use breakpoint method to update ftrace caller"
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NPetr Mladek <pmladek@suse.cz>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      12729f14
    • S
      ftrace/x86: Run a sync after fixup on failure · c932c6b7
      Steven Rostedt (Red Hat) 提交于
      If a failure occurs while enabling a trace, it bails out and will remove
      the tracepoints to be back to what the code originally was. But the fix
      up had some bugs in it. By injecting a failure in the code, the fix up
      ran to completion, but shortly afterward the system rebooted.
      
      There was two bugs here.
      
      The first was that there was no final sync run across the CPUs after the
      fix up was done, and before the ftrace int3 handler flag was reset. That
      means that other CPUs could still see the breakpoint and trigger on it
      long after the flag was cleared, and the int3 handler would think it was
      a spurious interrupt. Worse yet, the int3 handler could hit other breakpoints
      because the ftrace int3 handler flag would have prevented the int3 handler
      from going further.
      
      Here's a description of the issue:
      
      	CPU0				CPU1
      	----				----
        remove_breakpoint();
        modifying_ftrace_code = 0;
      
      				[still sees breakpoint]
      				<takes trap>
      				[sees modifying_ftrace_code as zero]
      				[no breakpoint handler]
      				[goto failed case]
      				[trap exception - kernel breakpoint, no
      				 handler]
      				BUG()
      
      The second bug was that the removal of the breakpoints required the
      "within()" logic updates instead of accessing the ip address directly.
      As the kernel text is mapped read-only when CONFIG_DEBUG_RODATA is set, and
      the removal of the breakpoint is a modification of the kernel text.
      The ftrace_write() includes the "within()" logic, where as, the
      probe_kernel_write() does not. This prevented the breakpoint from being
      removed at all.
      
      Link: http://lkml.kernel.org/r/1392650573-3390-1-git-send-email-pmladek@suse.czReported-by: NPetr Mladek <pmladek@suse.cz>
      Tested-by: NPetr Mladek <pmladek@suse.cz>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      c932c6b7
  3. 13 2月, 2014 1 次提交
  4. 12 2月, 2014 1 次提交
    • S
      ftrace/x86: Use breakpoints for converting function graph caller · 87fbb2ac
      Steven Rostedt (Red Hat) 提交于
      When the conversion was made to remove stop machine and use the breakpoint
      logic instead, the modification of the function graph caller is still
      done directly as though it was being done under stop machine.
      
      As it is not converted via stop machine anymore, there is a possibility
      that the code could be layed across cache lines and if another CPU is
      accessing that function graph call when it is being updated, it could
      cause a General Protection Fault.
      
      Convert the update of the function graph caller to use the breakpoint
      method as well.
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: stable@vger.kernel.org # 3.5+
      Fixes: 08d636b6 "ftrace/x86: Have arch x86_64 use breakpoints instead of stop machine"
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      87fbb2ac
  5. 09 2月, 2014 1 次提交
  6. 07 2月, 2014 1 次提交
  7. 31 1月, 2014 1 次提交
  8. 30 1月, 2014 4 次提交
  9. 28 1月, 2014 1 次提交
  10. 25 1月, 2014 6 次提交
    • M
      mm, x86: Revisit tlb_flushall_shift tuning for page flushes except on IvyBridge · b9a3b4c9
      Mel Gorman 提交于
      There was a large ebizzy performance regression that was
      bisected to commit 611ae8e3 (x86/tlb: enable tlb flush range
      support for x86).  The problem was related to the
      tlb_flushall_shift tuning for IvyBridge which was altered.  The
      problem is that it is not clear if the tuning values for each
      CPU family is correct as the methodology used to tune the values
      is unclear.
      
      This patch uses a conservative tlb_flushall_shift value for all
      CPU families except IvyBridge so the decision can be revisited
      if any regression is found as a result of this change.
      IvyBridge is an exception as testing with one methodology
      determined that the value of 2 is acceptable.  Details are in
      the changelog for the patch "x86: mm: Change tlb_flushall_shift
      for IvyBridge".
      
      One important aspect of this to watch out for is Xen.  The
      original commit log mentioned large performance gains on Xen.
      It's possible Xen is more sensitive to this value if it flushes
      small ranges of pages more frequently than workloads on bare
      metal typically do.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Tested-by: NDavidlohr Bueso <davidlohr@hp.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Alex Shi <alex.shi@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-dyzMww3fqugnhbhgo6Gxmtkw@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b9a3b4c9
    • M
      x86: mm: change tlb_flushall_shift for IvyBridge · f98b7a77
      Mel Gorman 提交于
      There was a large performance regression that was bisected to
      commit 611ae8e3 ("x86/tlb: enable tlb flush range support for
      x86").  This patch simply changes the default balance point
      between a local and global flush for IvyBridge.
      
      In the interest of allowing the tests to be reproduced, this
      patch was tested using mmtests 0.15 with the following
      configurations
      
      	configs/config-global-dhp__tlbflush-performance
      	configs/config-global-dhp__scheduler-performance
      	configs/config-global-dhp__network-performance
      
      Results are from two machines
      
      Ivybridge   4 threads:  Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz
      Ivybridge   8 threads:  Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
      
      Page fault microbenchmark showed nothing interesting.
      
      Ebizzy was configured to run multiple iterations and threads.
      Thread counts ranged from 1 to NR_CPUS*2. For each thread count,
      it ran 100 iterations and each iteration lasted 10 seconds.
      
      Ivybridge 4 threads
                          3.13.0-rc7            3.13.0-rc7
                             vanilla           altshift-v3
      Mean   1     6395.44 (  0.00%)     6789.09 (  6.16%)
      Mean   2     7012.85 (  0.00%)     8052.16 ( 14.82%)
      Mean   3     6403.04 (  0.00%)     6973.74 (  8.91%)
      Mean   4     6135.32 (  0.00%)     6582.33 (  7.29%)
      Mean   5     6095.69 (  0.00%)     6526.68 (  7.07%)
      Mean   6     6114.33 (  0.00%)     6416.64 (  4.94%)
      Mean   7     6085.10 (  0.00%)     6448.51 (  5.97%)
      Mean   8     6120.62 (  0.00%)     6462.97 (  5.59%)
      
      Ivybridge 8 threads
                           3.13.0-rc7            3.13.0-rc7
                              vanilla           altshift-v3
      Mean   1      7336.65 (  0.00%)     7787.02 (  6.14%)
      Mean   2      8218.41 (  0.00%)     9484.13 ( 15.40%)
      Mean   3      7973.62 (  0.00%)     8922.01 ( 11.89%)
      Mean   4      7798.33 (  0.00%)     8567.03 (  9.86%)
      Mean   5      7158.72 (  0.00%)     8214.23 ( 14.74%)
      Mean   6      6852.27 (  0.00%)     7952.45 ( 16.06%)
      Mean   7      6774.65 (  0.00%)     7536.35 ( 11.24%)
      Mean   8      6510.50 (  0.00%)     6894.05 (  5.89%)
      Mean   12     6182.90 (  0.00%)     6661.29 (  7.74%)
      Mean   16     6100.09 (  0.00%)     6608.69 (  8.34%)
      
      Ebizzy hits the worst case scenario for TLB range flushing every
      time and it shows for these Ivybridge CPUs at least that the
      default choice is a poor on.  The patch addresses the problem.
      
      Next was a tlbflush microbenchmark written by Alex Shi at
      http://marc.info/?l=linux-kernel&m=133727348217113 .  It
      measures access costs while the TLB is being flushed.  The
      expectation is that if there are always full TLB flushes that
      the benchmark would suffer and it benefits from range flushing
      
      There are 320 iterations of the test per thread count.  The
      number of entries is randomly selected with a min of 1 and max
      of 512.  To ensure a reasonably even spread of entries, the full
      range is broken up into 8 sections and a random number selected
      within that section.
      
      iteration 1, random number between 0-64
      iteration 2, random number between 64-128 etc
      
      This is still a very weak methodology.  When you do not know
      what are typical ranges, random is a reasonable choice but it
      can be easily argued that the opimisation was for smaller ranges
      and an even spread is not representative of any workload that
      matters.  To improve this, we'd need to know the probability
      distribution of TLB flush range sizes for a set of workloads
      that are considered "common", build a synthetic trace and feed
      that into this benchmark.  Even that is not perfect because it
      would not account for the time between flushes but there are
      limits of what can be reasonably done and still be doing
      something useful.  If a representative synthetic trace is
      provided then this benchmark could be revisited and the shift values retuned.
      
      Ivybridge 4 threads
                              3.13.0-rc7            3.13.0-rc7
                                 vanilla           altshift-v3
      Mean       1       10.50 (  0.00%)       10.50 (  0.03%)
      Mean       2       17.59 (  0.00%)       17.18 (  2.34%)
      Mean       3       22.98 (  0.00%)       21.74 (  5.41%)
      Mean       5       47.13 (  0.00%)       46.23 (  1.92%)
      Mean       8       43.30 (  0.00%)       42.56 (  1.72%)
      
      Ivybridge 8 threads
                               3.13.0-rc7            3.13.0-rc7
                                  vanilla           altshift-v3
      Mean       1         9.45 (  0.00%)        9.36 (  0.93%)
      Mean       2         9.37 (  0.00%)        9.70 ( -3.54%)
      Mean       3         9.36 (  0.00%)        9.29 (  0.70%)
      Mean       5        14.49 (  0.00%)       15.04 ( -3.75%)
      Mean       8        41.08 (  0.00%)       38.73 (  5.71%)
      Mean       13       32.04 (  0.00%)       31.24 (  2.49%)
      Mean       16       40.05 (  0.00%)       39.04 (  2.51%)
      
      For both CPUs, average access time is reduced which is good as
      this is the benchmark that was used to tune the shift values in
      the first place albeit it is now known *how* the benchmark was
      used.
      
      The scheduler benchmarks were somewhat inconclusive.  They
      showed gains and losses and makes me reconsider how stable those
      benchmarks really are or if something else might be interfering
      with the test results recently.
      
      Network benchmarks were inconclusive.  Almost all results were
      flat except for netperf-udp tests on the 4 thread machine.
      These results were unstable and showed large variations between
      reboots.  It is unknown if this is a recent problems but I've
      noticed before that netperf-udp results tend to vary.
      
      Based on these results, changing the default for Ivybridge seems
      like a logical choice.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Tested-by: NDavidlohr Bueso <davidlohr@hp.com>
      Reviewed-by: NAlex Shi <alex.shi@linaro.org>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-cqnadffh1tiqrshthRj3Esge@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f98b7a77
    • M
      mm, x86: Account for TLB flushes only when debugging · ec659934
      Mel Gorman 提交于
      Bisection between 3.11 and 3.12 fingered commit 9824cf97 ("mm:
      vmstats: tlb flush counters") to cause overhead problems.
      
      The counters are undeniably useful but how often do we really
      need to debug TLB flush related issues?  It does not justify
      taking the penalty everywhere so make it a debugging option.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Tested-by: NDavidlohr Bueso <davidlohr@hp.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Alex Shi <alex.shi@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-XzxjntugxuwpxXhcrxqqh53b@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ec659934
    • M
      x86/uv/nmi: Fix Sparse warnings · 74c93f9d
      Mike Travis 提交于
      Make uv_register_nmi_notifier() and uv_handle_nmi_ping() static
      to address sparse warnings.
      
      Fix problem where uv_nmi_kexec_failed is unused when
      CONFIG_KEXEC is not defined.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Reviewed-by: NHedi Berriche <hedi@sgi.com>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Link: http://lkml.kernel.org/r/20140114162551.480872353@asylum.americas.sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      74c93f9d
    • D
      x86/AMD/NB: Fix amd_set_subcaches() parameter type · 2993ae33
      Dan Carpenter 提交于
      This is under CAP_SYS_ADMIN, but Smatch complains that mask comes
      from the user and the test for "mask > 0xf" can underflow.
      
      The fix is simple: amd_set_subcaches() should hand down not an 'int'
      but an 'unsigned long' like it was originally indended to do.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale-asia.com>
      Link: http://lkml.kernel.org/r/20140121072209.GA22095@elgon.mountainSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2993ae33
    • A
      x86/quirks: Add workaround for AMD F16h Erratum792 · fb53a1ab
      Aravind Gopalakrishnan 提交于
      The workaround for this Erratum is included in AGESA. But BIOSes
      spun only after Jan2014 will have the fix (atleast server
      versions of the chip). The erratum affects both embedded and
      server platforms and since we cannot say with certainity that
      ALL BIOSes on systems out in the field will have the fix, we
      should probably insulate ourselves in case BIOS does not do the
      right thing or someone is using old BIOSes.
      
      Refer to Revision Guide for AMD F16h models 00h-0fh, document 51810
      Rev. 3.04, November2013 for details on the Erratum.
      
      Tested the patch on Fam16h server platform and it works fine.
      Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
      Cc: <hmh@hmh.eng.br>
      Cc: <Kim.Naru@amd.com>
      Cc: <Suravee.Suthikulpanit@amd.com>
      Cc: <bp@suse.de>
      Cc: <sherry.hurwitz@amd.com>
      Link: http://lkml.kernel.org/r/1390515212-1824-1-git-send-email-Aravind.Gopalakrishnan@amd.com
      [ Minor edits. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      fb53a1ab
  11. 23 1月, 2014 1 次提交
  12. 22 1月, 2014 2 次提交
  13. 17 1月, 2014 1 次提交
  14. 16 1月, 2014 5 次提交
    • R
      perf/x86/amd/ibs: Fix waking up from S3 for AMD family 10h · bee09ed9
      Robert Richter 提交于
      On AMD family 10h we see following error messages while waking up from
      S3 for all non-boot CPUs leading to a failed IBS initialization:
      
       Enabling non-boot CPUs ...
       smpboot: Booting Node 0 Processor 1 APIC 0x1
       [Firmware Bug]: cpu 1, try to use APIC500 (LVT offset 0) for vector 0x400, but the register is already in use for vector 0xf9 on another cpu
       perf: IBS APIC setup failed on cpu #1
       process: Switch to broadcast mode on CPU1
       CPU1 is up
       ...
       ACPI: Waking up from system sleep state S3
      
      Reason for this is that during suspend the LVT offset for the IBS
      vector gets lost and needs to be reinialized while resuming.
      
      The offset is read from the IBSCTL msr. On family 10h the offset needs
      to be 1 as offset 0 is used for the MCE threshold interrupt, but
      firmware assings it for IBS to 0 too. The kernel needs to reprogram
      the vector. The msr is a readonly node msr, but a new value can be
      written via pci config space access. The reinitialization is
      implemented for family 10h in setup_ibs_ctl() which is forced during
      IBS setup.
      
      This patch fixes IBS setup after waking up from S3 by adding
      resume/supend hooks for the boot cpu which does the offset
      reinitialization.
      
      Marking it as stable to let distros pick up this fix.
      Signed-off-by: NRobert Richter <rric@kernel.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> v3.2..
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1389797849-5565-1-git-send-email-rric.net@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bee09ed9
    • B
      x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs · 7da7c156
      Bin Gao 提交于
      On SoCs that have the calibration MSRs available, either there is no
      PIT, HPET or PMTIMER to calibrate against, or the PIT/HPET/PMTIMER is
      driven from the same clock as the TSC, so calibration is redundant and
      just slows down the boot.
      
      TSC rate is caculated by this formula:
      <maximum core-clock to bus-clock ratio> * <maximum resolved frequency>
      The ratio and the resolved frequency ID can be obtained from MSR.
      See Intel 64 and IA-32 System Programming Guid section 16.12 and 30.11.5
      for details.
      Signed-off-by: NBin Gao <bin.gao@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/n/tip-rgm7xmg7k6qnjlw3ynkcjsmh@git.kernel.org
      7da7c156
    • P
      x86: Add check for number of available vectors before CPU down · da6139e4
      Prarit Bhargava 提交于
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791
      
      When a cpu is downed on a system, the irqs on the cpu are assigned to
      other cpus.  It is possible, however, that when a cpu is downed there
      aren't enough free vectors on the remaining cpus to account for the
      vectors from the cpu that is being downed.
      
      This results in an interesting "overflow" condition where irqs are
      "assigned" to a CPU but are not handled.
      
      For example, when downing cpus on a 1-64 logical processor system:
      
      <snip>
      [  232.021745] smpboot: CPU 61 is now offline
      [  238.480275] smpboot: CPU 62 is now offline
      [  245.991080] ------------[ cut here ]------------
      [  245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x246/0x250()
      [  246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out
      [  246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas scsi_transport_sas
      [  246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14
      [  246.044451] Hardware name: Intel Corporation S4600LH ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
      [  246.057371]  0000000000000009 ffff88081fa03d40 ffffffff8164fbf6 ffff88081fa0ee48
      [  246.065728]  ffff88081fa03d90 ffff88081fa03d80 ffffffff81054ecc ffff88081fa13040
      [  246.074073]  0000000000000000 ffff88200cce0000 0000000000000040 0000000000000000
      [  246.082430] Call Trace:
      [  246.085174]  <IRQ>  [<ffffffff8164fbf6>] dump_stack+0x46/0x58
      [  246.091633]  [<ffffffff81054ecc>] warn_slowpath_common+0x8c/0xc0
      [  246.098352]  [<ffffffff81054fb6>] warn_slowpath_fmt+0x46/0x50
      [  246.104786]  [<ffffffff815710d6>] dev_watchdog+0x246/0x250
      [  246.110923]  [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
      [  246.119097]  [<ffffffff8106092a>] call_timer_fn+0x3a/0x110
      [  246.125224]  [<ffffffff8106280f>] ? update_process_times+0x6f/0x80
      [  246.132137]  [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
      [  246.140308]  [<ffffffff81061db0>] run_timer_softirq+0x1f0/0x2a0
      [  246.146933]  [<ffffffff81059a80>] __do_softirq+0xe0/0x220
      [  246.152976]  [<ffffffff8165fedc>] call_softirq+0x1c/0x30
      [  246.158920]  [<ffffffff810045f5>] do_softirq+0x55/0x90
      [  246.164670]  [<ffffffff81059d35>] irq_exit+0xa5/0xb0
      [  246.170227]  [<ffffffff8166062a>] smp_apic_timer_interrupt+0x4a/0x60
      [  246.177324]  [<ffffffff8165f40a>] apic_timer_interrupt+0x6a/0x70
      [  246.184041]  <EOI>  [<ffffffff81505a1b>] ? cpuidle_enter_state+0x5b/0xe0
      [  246.191559]  [<ffffffff81505a17>] ? cpuidle_enter_state+0x57/0xe0
      [  246.198374]  [<ffffffff81505b5d>] cpuidle_idle_call+0xbd/0x200
      [  246.204900]  [<ffffffff8100b7ae>] arch_cpu_idle+0xe/0x30
      [  246.210846]  [<ffffffff810a47b0>] cpu_startup_entry+0xd0/0x250
      [  246.217371]  [<ffffffff81646b47>] rest_init+0x77/0x80
      [  246.223028]  [<ffffffff81d09e8e>] start_kernel+0x3ee/0x3fb
      [  246.229165]  [<ffffffff81d0989f>] ? repair_env_string+0x5e/0x5e
      [  246.235787]  [<ffffffff81d095a5>] x86_64_start_reservations+0x2a/0x2c
      [  246.242990]  [<ffffffff81d0969f>] x86_64_start_kernel+0xf8/0xfc
      [  246.249610] ---[ end trace fb74fdef54d79039 ]---
      [  246.254807] ixgbe 0000:c2:00.0 p786p1: initiating reset due to tx timeout
      [  246.262489] ixgbe 0000:c2:00.0 p786p1: Reset adapter
      Last login: Mon Nov 11 08:35:14 from 10.18.17.119
      [root@(none) ~]# [  246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
      [  249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
      [  246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
      [  249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
      
      (last lines keep repeating.  ixgbe driver is dead until module reload.)
      
      If the downed cpu has more vectors than are free on the remaining cpus on the
      system, it is possible that some vectors are "orphaned" even though they are
      assigned to a cpu.  In this case, since the ixgbe driver had a watchdog, the
      watchdog fired and notified that something was wrong.
      
      This patch adds a function, check_vectors(), to compare the number of vectors
      on the CPU going down and compares it to the number of vectors available on
      the system.  If there aren't enough vectors for the CPU to go down, an
      error is returned and propogated back to userspace.
      
      v2: Do not need to look at percpu irqs
      v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest
          Priority Mode
      v4: Additional changes suggested by Gong Chen.
      v5/v6/v7/v8: Updated comment text
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Link: http://lkml.kernel.org/r/1389613861-3853-1-git-send-email-prarit@redhat.comReviewed-by: NGong Chen <gong.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Yang Zhang <yang.z.zhang@Intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Janet Morgan <janet.morgan@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Ruiv Wang <ruiv.wang@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      da6139e4
    • H
      x86, apic: Make disabled_cpu_apicid static read_mostly, fix typos · 5b4d1dbc
      H. Peter Anvin 提交于
      Make disabled_cpu_apicid static and read_mostly, and fix a couple of
      typos.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Link: http://lkml.kernel.org/r/20140115182511.GA22737@gmail.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      5b4d1dbc
    • H
      x86, apic, kexec: Add disable_cpu_apicid kernel parameter · 151e0c7d
      HATAYAMA Daisuke 提交于
      Add disable_cpu_apicid kernel parameter. To use this kernel parameter,
      specify an initial APIC ID of the corresponding CPU you want to
      disable.
      
      This is mostly used for the kdump 2nd kernel to disable BSP to wake up
      multiple CPUs without causing system reset or hang due to sending INIT
      from AP to BSP.
      
      Kdump users first figure out initial APIC ID of the BSP, CPU0 in the
      1st kernel, for example from /proc/cpuinfo and then set up this kernel
      parameter for the 2nd kernel using the obtained APIC ID.
      
      However, doing this procedure at each boot time manually is awkward,
      which should be automatically done by user-land service scripts, for
      example, kexec-tools on fedora/RHEL distributions.
      
      This design is more flexible than disabling BSP in kernel boot time
      automatically in that in kernel boot time we have no choice but
      referring to ACPI/MP table to obtain initial APIC ID for BSP, meaning
      that the method is not applicable to the systems without such BIOS
      tables.
      
      One assumption behind this design is that users get initial APIC ID of
      the BSP in still healthy state and so BSP is uniquely kept in
      CPU0. Thus, through the kernel parameter, only one initial APIC ID can
      be specified.
      
      In a comparison with disabled_cpu_apicid, we use read_apic_id(), not
      boot_cpu_physical_apicid, because on some platforms, the variable is
      modified to the apicid reported as BSP through MP table and this
      function is executed with the temporarily modified
      boot_cpu_physical_apicid. As a result, disabled_cpu_apicid kernel
      parameter doesn't work well for apicids of APs.
      
      Fixing the wrong handling of boot_cpu_physical_apicid requires some
      reviews and tests beyond some platforms and it could take some
      time. The fix here is a kind of workaround to focus on the main topic
      of this patch.
      Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Link: http://lkml.kernel.org/r/20140115064458.1545.38775.stgit@localhost6.localdomain6Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      151e0c7d
  15. 15 1月, 2014 2 次提交
  16. 14 1月, 2014 5 次提交
  17. 13 1月, 2014 3 次提交
    • P
      sched/clock, x86: Avoid a runtime condition in native_sched_clock() · 10b033d4
      Peter Zijlstra 提交于
      Use a static_key to avoid touching tsc_disabled and a runtime
      condition in native_sched_clock() -- less cachelines touched is always
      better.
      
                              MAINLINE   PRE       POST
      
          sched_clock_stable: 1          1         1
          (cold) sched_clock: 329841     215295    213039
          (cold) local_clock: 301773     220773    216084
          (warm) sched_clock: 38375      25659     25231
          (warm) local_clock: 100371     27242     27601
          (warm) rdtsc:       27340      24208     24203
          sched_clock_stable: 0          0         0
          (cold) sched_clock: 382634     237019    240055
          (cold) local_clock: 396890     294819    299942
          (warm) sched_clock: 38194      25609     25276
          (warm) local_clock: 143452     71232     73232
          (warm) rdtsc:       27345      24243     24244
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-hrz87bo37qke25bty6pnfy4b@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      10b033d4
    • P
      sched/clock, x86: Use a static_key for sched_clock_stable · 35af99e6
      Peter Zijlstra 提交于
      In order to avoid the runtime condition and variable load turn
      sched_clock_stable into a static_key.
      
      Also provide a shorter implementation of local_clock() and
      cpu_clock(int) when sched_clock_stable==1.
      
                              MAINLINE   PRE       POST
      
          sched_clock_stable: 1          1         1
          (cold) sched_clock: 329841     221876    215295
          (cold) local_clock: 301773     234692    220773
          (warm) sched_clock: 38375      25602     25659
          (warm) local_clock: 100371     33265     27242
          (warm) rdtsc:       27340      24214     24208
          sched_clock_stable: 0          0         0
          (cold) sched_clock: 382634     235941    237019
          (cold) local_clock: 396890     297017    294819
          (warm) sched_clock: 38194      25233     25609
          (warm) local_clock: 143452     71234     71232
          (warm) rdtsc:       27345      24245     24243
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      35af99e6
    • P
      sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs · 20d1c86a
      Peter Zijlstra 提交于
      Use a ring-buffer like multi-version object structure which allows
      always having a coherent object; we use this to avoid having to
      disable IRQs while reading sched_clock() and avoids a problem when
      getting an NMI while changing the cyc2ns data.
      
                              MAINLINE   PRE        POST
      
          sched_clock_stable: 1          1          1
          (cold) sched_clock: 329841     331312     257223
          (cold) local_clock: 301773     310296     309889
          (warm) sched_clock: 38375      38247      25280
          (warm) local_clock: 100371     102713     85268
          (warm) rdtsc:       27340      27289      24247
          sched_clock_stable: 0          0          0
          (cold) sched_clock: 382634     372706     301224
          (cold) local_clock: 396890     399275     399870
          (warm) sched_clock: 38194      38124      25630
          (warm) local_clock: 143452     148698     129629
          (warm) rdtsc:       27345      27365      24307
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-s567in1e5ekq2nlyhn8f987r@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      20d1c86a