1. 11 3月, 2014 1 次提交
  2. 09 2月, 2014 3 次提交
  3. 07 2月, 2014 1 次提交
  4. 31 1月, 2014 1 次提交
  5. 30 1月, 2014 4 次提交
  6. 28 1月, 2014 1 次提交
  7. 25 1月, 2014 6 次提交
    • M
      mm, x86: Revisit tlb_flushall_shift tuning for page flushes except on IvyBridge · b9a3b4c9
      Mel Gorman 提交于
      There was a large ebizzy performance regression that was
      bisected to commit 611ae8e3 (x86/tlb: enable tlb flush range
      support for x86).  The problem was related to the
      tlb_flushall_shift tuning for IvyBridge which was altered.  The
      problem is that it is not clear if the tuning values for each
      CPU family is correct as the methodology used to tune the values
      is unclear.
      
      This patch uses a conservative tlb_flushall_shift value for all
      CPU families except IvyBridge so the decision can be revisited
      if any regression is found as a result of this change.
      IvyBridge is an exception as testing with one methodology
      determined that the value of 2 is acceptable.  Details are in
      the changelog for the patch "x86: mm: Change tlb_flushall_shift
      for IvyBridge".
      
      One important aspect of this to watch out for is Xen.  The
      original commit log mentioned large performance gains on Xen.
      It's possible Xen is more sensitive to this value if it flushes
      small ranges of pages more frequently than workloads on bare
      metal typically do.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Tested-by: NDavidlohr Bueso <davidlohr@hp.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Alex Shi <alex.shi@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-dyzMww3fqugnhbhgo6Gxmtkw@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b9a3b4c9
    • M
      x86: mm: change tlb_flushall_shift for IvyBridge · f98b7a77
      Mel Gorman 提交于
      There was a large performance regression that was bisected to
      commit 611ae8e3 ("x86/tlb: enable tlb flush range support for
      x86").  This patch simply changes the default balance point
      between a local and global flush for IvyBridge.
      
      In the interest of allowing the tests to be reproduced, this
      patch was tested using mmtests 0.15 with the following
      configurations
      
      	configs/config-global-dhp__tlbflush-performance
      	configs/config-global-dhp__scheduler-performance
      	configs/config-global-dhp__network-performance
      
      Results are from two machines
      
      Ivybridge   4 threads:  Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz
      Ivybridge   8 threads:  Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
      
      Page fault microbenchmark showed nothing interesting.
      
      Ebizzy was configured to run multiple iterations and threads.
      Thread counts ranged from 1 to NR_CPUS*2. For each thread count,
      it ran 100 iterations and each iteration lasted 10 seconds.
      
      Ivybridge 4 threads
                          3.13.0-rc7            3.13.0-rc7
                             vanilla           altshift-v3
      Mean   1     6395.44 (  0.00%)     6789.09 (  6.16%)
      Mean   2     7012.85 (  0.00%)     8052.16 ( 14.82%)
      Mean   3     6403.04 (  0.00%)     6973.74 (  8.91%)
      Mean   4     6135.32 (  0.00%)     6582.33 (  7.29%)
      Mean   5     6095.69 (  0.00%)     6526.68 (  7.07%)
      Mean   6     6114.33 (  0.00%)     6416.64 (  4.94%)
      Mean   7     6085.10 (  0.00%)     6448.51 (  5.97%)
      Mean   8     6120.62 (  0.00%)     6462.97 (  5.59%)
      
      Ivybridge 8 threads
                           3.13.0-rc7            3.13.0-rc7
                              vanilla           altshift-v3
      Mean   1      7336.65 (  0.00%)     7787.02 (  6.14%)
      Mean   2      8218.41 (  0.00%)     9484.13 ( 15.40%)
      Mean   3      7973.62 (  0.00%)     8922.01 ( 11.89%)
      Mean   4      7798.33 (  0.00%)     8567.03 (  9.86%)
      Mean   5      7158.72 (  0.00%)     8214.23 ( 14.74%)
      Mean   6      6852.27 (  0.00%)     7952.45 ( 16.06%)
      Mean   7      6774.65 (  0.00%)     7536.35 ( 11.24%)
      Mean   8      6510.50 (  0.00%)     6894.05 (  5.89%)
      Mean   12     6182.90 (  0.00%)     6661.29 (  7.74%)
      Mean   16     6100.09 (  0.00%)     6608.69 (  8.34%)
      
      Ebizzy hits the worst case scenario for TLB range flushing every
      time and it shows for these Ivybridge CPUs at least that the
      default choice is a poor on.  The patch addresses the problem.
      
      Next was a tlbflush microbenchmark written by Alex Shi at
      http://marc.info/?l=linux-kernel&m=133727348217113 .  It
      measures access costs while the TLB is being flushed.  The
      expectation is that if there are always full TLB flushes that
      the benchmark would suffer and it benefits from range flushing
      
      There are 320 iterations of the test per thread count.  The
      number of entries is randomly selected with a min of 1 and max
      of 512.  To ensure a reasonably even spread of entries, the full
      range is broken up into 8 sections and a random number selected
      within that section.
      
      iteration 1, random number between 0-64
      iteration 2, random number between 64-128 etc
      
      This is still a very weak methodology.  When you do not know
      what are typical ranges, random is a reasonable choice but it
      can be easily argued that the opimisation was for smaller ranges
      and an even spread is not representative of any workload that
      matters.  To improve this, we'd need to know the probability
      distribution of TLB flush range sizes for a set of workloads
      that are considered "common", build a synthetic trace and feed
      that into this benchmark.  Even that is not perfect because it
      would not account for the time between flushes but there are
      limits of what can be reasonably done and still be doing
      something useful.  If a representative synthetic trace is
      provided then this benchmark could be revisited and the shift values retuned.
      
      Ivybridge 4 threads
                              3.13.0-rc7            3.13.0-rc7
                                 vanilla           altshift-v3
      Mean       1       10.50 (  0.00%)       10.50 (  0.03%)
      Mean       2       17.59 (  0.00%)       17.18 (  2.34%)
      Mean       3       22.98 (  0.00%)       21.74 (  5.41%)
      Mean       5       47.13 (  0.00%)       46.23 (  1.92%)
      Mean       8       43.30 (  0.00%)       42.56 (  1.72%)
      
      Ivybridge 8 threads
                               3.13.0-rc7            3.13.0-rc7
                                  vanilla           altshift-v3
      Mean       1         9.45 (  0.00%)        9.36 (  0.93%)
      Mean       2         9.37 (  0.00%)        9.70 ( -3.54%)
      Mean       3         9.36 (  0.00%)        9.29 (  0.70%)
      Mean       5        14.49 (  0.00%)       15.04 ( -3.75%)
      Mean       8        41.08 (  0.00%)       38.73 (  5.71%)
      Mean       13       32.04 (  0.00%)       31.24 (  2.49%)
      Mean       16       40.05 (  0.00%)       39.04 (  2.51%)
      
      For both CPUs, average access time is reduced which is good as
      this is the benchmark that was used to tune the shift values in
      the first place albeit it is now known *how* the benchmark was
      used.
      
      The scheduler benchmarks were somewhat inconclusive.  They
      showed gains and losses and makes me reconsider how stable those
      benchmarks really are or if something else might be interfering
      with the test results recently.
      
      Network benchmarks were inconclusive.  Almost all results were
      flat except for netperf-udp tests on the 4 thread machine.
      These results were unstable and showed large variations between
      reboots.  It is unknown if this is a recent problems but I've
      noticed before that netperf-udp results tend to vary.
      
      Based on these results, changing the default for Ivybridge seems
      like a logical choice.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Tested-by: NDavidlohr Bueso <davidlohr@hp.com>
      Reviewed-by: NAlex Shi <alex.shi@linaro.org>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-cqnadffh1tiqrshthRj3Esge@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f98b7a77
    • M
      mm, x86: Account for TLB flushes only when debugging · ec659934
      Mel Gorman 提交于
      Bisection between 3.11 and 3.12 fingered commit 9824cf97 ("mm:
      vmstats: tlb flush counters") to cause overhead problems.
      
      The counters are undeniably useful but how often do we really
      need to debug TLB flush related issues?  It does not justify
      taking the penalty everywhere so make it a debugging option.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Tested-by: NDavidlohr Bueso <davidlohr@hp.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Alex Shi <alex.shi@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-XzxjntugxuwpxXhcrxqqh53b@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ec659934
    • M
      x86/uv/nmi: Fix Sparse warnings · 74c93f9d
      Mike Travis 提交于
      Make uv_register_nmi_notifier() and uv_handle_nmi_ping() static
      to address sparse warnings.
      
      Fix problem where uv_nmi_kexec_failed is unused when
      CONFIG_KEXEC is not defined.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Reviewed-by: NHedi Berriche <hedi@sgi.com>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Link: http://lkml.kernel.org/r/20140114162551.480872353@asylum.americas.sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      74c93f9d
    • D
      x86/AMD/NB: Fix amd_set_subcaches() parameter type · 2993ae33
      Dan Carpenter 提交于
      This is under CAP_SYS_ADMIN, but Smatch complains that mask comes
      from the user and the test for "mask > 0xf" can underflow.
      
      The fix is simple: amd_set_subcaches() should hand down not an 'int'
      but an 'unsigned long' like it was originally indended to do.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale-asia.com>
      Link: http://lkml.kernel.org/r/20140121072209.GA22095@elgon.mountainSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2993ae33
    • A
      x86/quirks: Add workaround for AMD F16h Erratum792 · fb53a1ab
      Aravind Gopalakrishnan 提交于
      The workaround for this Erratum is included in AGESA. But BIOSes
      spun only after Jan2014 will have the fix (atleast server
      versions of the chip). The erratum affects both embedded and
      server platforms and since we cannot say with certainity that
      ALL BIOSes on systems out in the field will have the fix, we
      should probably insulate ourselves in case BIOS does not do the
      right thing or someone is using old BIOSes.
      
      Refer to Revision Guide for AMD F16h models 00h-0fh, document 51810
      Rev. 3.04, November2013 for details on the Erratum.
      
      Tested the patch on Fam16h server platform and it works fine.
      Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
      Cc: <hmh@hmh.eng.br>
      Cc: <Kim.Naru@amd.com>
      Cc: <Suravee.Suthikulpanit@amd.com>
      Cc: <bp@suse.de>
      Cc: <sherry.hurwitz@amd.com>
      Link: http://lkml.kernel.org/r/1390515212-1824-1-git-send-email-Aravind.Gopalakrishnan@amd.com
      [ Minor edits. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      fb53a1ab
  8. 23 1月, 2014 1 次提交
  9. 22 1月, 2014 2 次提交
  10. 17 1月, 2014 1 次提交
  11. 16 1月, 2014 5 次提交
    • R
      perf/x86/amd/ibs: Fix waking up from S3 for AMD family 10h · bee09ed9
      Robert Richter 提交于
      On AMD family 10h we see following error messages while waking up from
      S3 for all non-boot CPUs leading to a failed IBS initialization:
      
       Enabling non-boot CPUs ...
       smpboot: Booting Node 0 Processor 1 APIC 0x1
       [Firmware Bug]: cpu 1, try to use APIC500 (LVT offset 0) for vector 0x400, but the register is already in use for vector 0xf9 on another cpu
       perf: IBS APIC setup failed on cpu #1
       process: Switch to broadcast mode on CPU1
       CPU1 is up
       ...
       ACPI: Waking up from system sleep state S3
      
      Reason for this is that during suspend the LVT offset for the IBS
      vector gets lost and needs to be reinialized while resuming.
      
      The offset is read from the IBSCTL msr. On family 10h the offset needs
      to be 1 as offset 0 is used for the MCE threshold interrupt, but
      firmware assings it for IBS to 0 too. The kernel needs to reprogram
      the vector. The msr is a readonly node msr, but a new value can be
      written via pci config space access. The reinitialization is
      implemented for family 10h in setup_ibs_ctl() which is forced during
      IBS setup.
      
      This patch fixes IBS setup after waking up from S3 by adding
      resume/supend hooks for the boot cpu which does the offset
      reinitialization.
      
      Marking it as stable to let distros pick up this fix.
      Signed-off-by: NRobert Richter <rric@kernel.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> v3.2..
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1389797849-5565-1-git-send-email-rric.net@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bee09ed9
    • B
      x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs · 7da7c156
      Bin Gao 提交于
      On SoCs that have the calibration MSRs available, either there is no
      PIT, HPET or PMTIMER to calibrate against, or the PIT/HPET/PMTIMER is
      driven from the same clock as the TSC, so calibration is redundant and
      just slows down the boot.
      
      TSC rate is caculated by this formula:
      <maximum core-clock to bus-clock ratio> * <maximum resolved frequency>
      The ratio and the resolved frequency ID can be obtained from MSR.
      See Intel 64 and IA-32 System Programming Guid section 16.12 and 30.11.5
      for details.
      Signed-off-by: NBin Gao <bin.gao@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/n/tip-rgm7xmg7k6qnjlw3ynkcjsmh@git.kernel.org
      7da7c156
    • P
      x86: Add check for number of available vectors before CPU down · da6139e4
      Prarit Bhargava 提交于
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791
      
      When a cpu is downed on a system, the irqs on the cpu are assigned to
      other cpus.  It is possible, however, that when a cpu is downed there
      aren't enough free vectors on the remaining cpus to account for the
      vectors from the cpu that is being downed.
      
      This results in an interesting "overflow" condition where irqs are
      "assigned" to a CPU but are not handled.
      
      For example, when downing cpus on a 1-64 logical processor system:
      
      <snip>
      [  232.021745] smpboot: CPU 61 is now offline
      [  238.480275] smpboot: CPU 62 is now offline
      [  245.991080] ------------[ cut here ]------------
      [  245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x246/0x250()
      [  246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out
      [  246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas scsi_transport_sas
      [  246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14
      [  246.044451] Hardware name: Intel Corporation S4600LH ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
      [  246.057371]  0000000000000009 ffff88081fa03d40 ffffffff8164fbf6 ffff88081fa0ee48
      [  246.065728]  ffff88081fa03d90 ffff88081fa03d80 ffffffff81054ecc ffff88081fa13040
      [  246.074073]  0000000000000000 ffff88200cce0000 0000000000000040 0000000000000000
      [  246.082430] Call Trace:
      [  246.085174]  <IRQ>  [<ffffffff8164fbf6>] dump_stack+0x46/0x58
      [  246.091633]  [<ffffffff81054ecc>] warn_slowpath_common+0x8c/0xc0
      [  246.098352]  [<ffffffff81054fb6>] warn_slowpath_fmt+0x46/0x50
      [  246.104786]  [<ffffffff815710d6>] dev_watchdog+0x246/0x250
      [  246.110923]  [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
      [  246.119097]  [<ffffffff8106092a>] call_timer_fn+0x3a/0x110
      [  246.125224]  [<ffffffff8106280f>] ? update_process_times+0x6f/0x80
      [  246.132137]  [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
      [  246.140308]  [<ffffffff81061db0>] run_timer_softirq+0x1f0/0x2a0
      [  246.146933]  [<ffffffff81059a80>] __do_softirq+0xe0/0x220
      [  246.152976]  [<ffffffff8165fedc>] call_softirq+0x1c/0x30
      [  246.158920]  [<ffffffff810045f5>] do_softirq+0x55/0x90
      [  246.164670]  [<ffffffff81059d35>] irq_exit+0xa5/0xb0
      [  246.170227]  [<ffffffff8166062a>] smp_apic_timer_interrupt+0x4a/0x60
      [  246.177324]  [<ffffffff8165f40a>] apic_timer_interrupt+0x6a/0x70
      [  246.184041]  <EOI>  [<ffffffff81505a1b>] ? cpuidle_enter_state+0x5b/0xe0
      [  246.191559]  [<ffffffff81505a17>] ? cpuidle_enter_state+0x57/0xe0
      [  246.198374]  [<ffffffff81505b5d>] cpuidle_idle_call+0xbd/0x200
      [  246.204900]  [<ffffffff8100b7ae>] arch_cpu_idle+0xe/0x30
      [  246.210846]  [<ffffffff810a47b0>] cpu_startup_entry+0xd0/0x250
      [  246.217371]  [<ffffffff81646b47>] rest_init+0x77/0x80
      [  246.223028]  [<ffffffff81d09e8e>] start_kernel+0x3ee/0x3fb
      [  246.229165]  [<ffffffff81d0989f>] ? repair_env_string+0x5e/0x5e
      [  246.235787]  [<ffffffff81d095a5>] x86_64_start_reservations+0x2a/0x2c
      [  246.242990]  [<ffffffff81d0969f>] x86_64_start_kernel+0xf8/0xfc
      [  246.249610] ---[ end trace fb74fdef54d79039 ]---
      [  246.254807] ixgbe 0000:c2:00.0 p786p1: initiating reset due to tx timeout
      [  246.262489] ixgbe 0000:c2:00.0 p786p1: Reset adapter
      Last login: Mon Nov 11 08:35:14 from 10.18.17.119
      [root@(none) ~]# [  246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
      [  249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
      [  246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
      [  249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
      
      (last lines keep repeating.  ixgbe driver is dead until module reload.)
      
      If the downed cpu has more vectors than are free on the remaining cpus on the
      system, it is possible that some vectors are "orphaned" even though they are
      assigned to a cpu.  In this case, since the ixgbe driver had a watchdog, the
      watchdog fired and notified that something was wrong.
      
      This patch adds a function, check_vectors(), to compare the number of vectors
      on the CPU going down and compares it to the number of vectors available on
      the system.  If there aren't enough vectors for the CPU to go down, an
      error is returned and propogated back to userspace.
      
      v2: Do not need to look at percpu irqs
      v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest
          Priority Mode
      v4: Additional changes suggested by Gong Chen.
      v5/v6/v7/v8: Updated comment text
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Link: http://lkml.kernel.org/r/1389613861-3853-1-git-send-email-prarit@redhat.comReviewed-by: NGong Chen <gong.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Yang Zhang <yang.z.zhang@Intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Janet Morgan <janet.morgan@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Ruiv Wang <ruiv.wang@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      da6139e4
    • H
      x86, apic: Make disabled_cpu_apicid static read_mostly, fix typos · 5b4d1dbc
      H. Peter Anvin 提交于
      Make disabled_cpu_apicid static and read_mostly, and fix a couple of
      typos.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Link: http://lkml.kernel.org/r/20140115182511.GA22737@gmail.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      5b4d1dbc
    • H
      x86, apic, kexec: Add disable_cpu_apicid kernel parameter · 151e0c7d
      HATAYAMA Daisuke 提交于
      Add disable_cpu_apicid kernel parameter. To use this kernel parameter,
      specify an initial APIC ID of the corresponding CPU you want to
      disable.
      
      This is mostly used for the kdump 2nd kernel to disable BSP to wake up
      multiple CPUs without causing system reset or hang due to sending INIT
      from AP to BSP.
      
      Kdump users first figure out initial APIC ID of the BSP, CPU0 in the
      1st kernel, for example from /proc/cpuinfo and then set up this kernel
      parameter for the 2nd kernel using the obtained APIC ID.
      
      However, doing this procedure at each boot time manually is awkward,
      which should be automatically done by user-land service scripts, for
      example, kexec-tools on fedora/RHEL distributions.
      
      This design is more flexible than disabling BSP in kernel boot time
      automatically in that in kernel boot time we have no choice but
      referring to ACPI/MP table to obtain initial APIC ID for BSP, meaning
      that the method is not applicable to the systems without such BIOS
      tables.
      
      One assumption behind this design is that users get initial APIC ID of
      the BSP in still healthy state and so BSP is uniquely kept in
      CPU0. Thus, through the kernel parameter, only one initial APIC ID can
      be specified.
      
      In a comparison with disabled_cpu_apicid, we use read_apic_id(), not
      boot_cpu_physical_apicid, because on some platforms, the variable is
      modified to the apicid reported as BSP through MP table and this
      function is executed with the temporarily modified
      boot_cpu_physical_apicid. As a result, disabled_cpu_apicid kernel
      parameter doesn't work well for apicids of APs.
      
      Fixing the wrong handling of boot_cpu_physical_apicid requires some
      reviews and tests beyond some platforms and it could take some
      time. The fix here is a kind of workaround to focus on the main topic
      of this patch.
      Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Link: http://lkml.kernel.org/r/20140115064458.1545.38775.stgit@localhost6.localdomain6Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      151e0c7d
  12. 15 1月, 2014 2 次提交
  13. 14 1月, 2014 5 次提交
  14. 13 1月, 2014 5 次提交
  15. 12 1月, 2014 2 次提交
    • B
      x86, mce: Fix mce_start_timer semantics · 4f75d841
      Borislav Petkov 提交于
      So mce_start_timer() has a 'cpu' argument which is supposed to mean to
      start a timer on that cpu. However, the code currently starts a timer on
      the *current* cpu the function runs on and causes the sanity-check in
      mce_timer_fn to fire:
      
      	WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/mcheck/mce.c:1286 mce_timer_fn
      
      because it is running on the wrong cpu.
      
      This was triggered by Prarit Bhargava <prarit@redhat.com> by offlining
      all the cpus in succession.
      
      Then, we were fiddling with the CMCI storm settings when starting the
      timer whereas there's no need for that - if there's storm happening
      on this newly restarted cpu, we're going to be in normal CMCI mode
      initially and then when the CMCI interrupt starts firing, we're going to
      go to the polling mode with the timer real soon.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Reviewed-by: NChen, Gong <gong.chen@linux.intel.com>
      Link: http://lkml.kernel.org/r/1387722156-5511-1-git-send-email-prarit@redhat.com
      4f75d841
    • P
      x86/irq: Fix do_IRQ() interrupt warning for cpu hotplug retriggered irqs · 9345005f
      Prarit Bhargava 提交于
      During heavy CPU-hotplug operations the following spurious kernel warnings
      can trigger:
      
        do_IRQ: No ... irq handler for vector (irq -1)
      
        [ See: https://bugzilla.kernel.org/show_bug.cgi?id=64831 ]
      
      When downing a cpu it is possible that there are unhandled irqs
      left in the APIC IRR register.  The following code path shows
      how the problem can occur:
      
       1. CPU 5 is to go down.
      
       2. cpu_disable() on CPU 5 executes with interrupt flag cleared
          by local_irq_save() via stop_machine().
      
       3. IRQ 12 asserts on CPU 5, setting IRR but not ISR because
          interrupt flag is cleared (CPU unabled to handle the irq)
      
       4. IRQs are migrated off of CPU 5, and the vectors' irqs are set
          to -1. 5. stop_machine() finishes cpu_disable()
      
       6. cpu_die() for CPU 5 executes in normal context.
      
       7. CPU 5 attempts to handle IRQ 12 because the IRR is set for
          IRQ 12.  The code attempts to find the vector's IRQ and cannot
          because it has been set to -1. 8. do_IRQ() warning displays
          warning about CPU 5 IRQ 12.
      
      I added a debug printk to output which CPU & vector was
      retriggered and discovered that that we are getting bogus
      events.  I see a 100% correlation between this debug printk in
      fixup_irqs() and the do_IRQ() warning.
      
      This patchset resolves this by adding definitions for
      VECTOR_UNDEFINED(-1) and VECTOR_RETRIGGERED(-2) and modifying
      the code to use them.
      
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=64831Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Reviewed-by: NRui Wang <rui.y.wang@intel.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Yang Zhang <yang.z.zhang@Intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: janet.morgan@Intel.com
      Cc: tony.luck@Intel.com
      Cc: ruiv.wang@gmail.com
      Link: http://lkml.kernel.org/r/1388938252-16627-1-git-send-email-prarit@redhat.com
      [ Cleaned up the code a bit. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9345005f