1. 18 5月, 2015 1 次提交
    • I
      x86/smp/boot: Fix legacy SMP bootup slow-boot bug · 7cb68598
      Ingo Molnar 提交于
      So while testing kernels using tools/kvm/ (kvmtool) I noticed that it
      booted super slow:
      
      [    0.142991] Performance Events: no PMU driver, software events only.
      [    0.149265] x86: Booting SMP configuration:
      [    0.149765] .... node  #0, CPUs:          #1
      [    0.148304] kvm-clock: cpu 1, msr 2:1bfe9041, secondary cpu clock
      [   10.158813] KVM setup async PF for cpu 1
      [   10.159000]    #2
      [   10.159000] kvm-stealtime: cpu 1, msr 211a4d400
      [   10.158829] kvm-clock: cpu 2, msr 2:1bfe9081, secondary cpu clock
      [   20.167805] KVM setup async PF for cpu 2
      [   20.168000]    #3
      [   20.168000] kvm-stealtime: cpu 2, msr 211a8d400
      [   20.167818] kvm-clock: cpu 3, msr 2:1bfe90c1, secondary cpu clock
      [   30.176902] KVM setup async PF for cpu 3
      [   30.177000]    #4
      [   30.177000] kvm-stealtime: cpu 3, msr 211acd400
      
      One CPU booted up per 10 seconds. With 120 CPUs that takes a while.
      
      Bisection pinpointed this commit:
      
        853b160a ("Revert f5d6a52f ("x86/smpboot: Skip delays during SMP initialization similar to Xen")")
      
      But that commit just restores previous behavior, so it cannot cause the
      problem. After some head scratching it turns out that these two commits:
      
        1a744cb3 ("x86/smp/boot: Remove 10ms delay from cpu_up() on modern processors")
        d68921f9 ("x86/smp/boot: Add cmdline "cpu_init_udelay=N" to specify cpu_up() delay")
      
      added the following code to smpboot.c:
      
      -               mdelay(10);
      +               mdelay(init_udelay);
      
      Note the mismatch in the units: the delay is called 'udelay' and is set
      to microseconds - while the function used here is actually 'mdelay',
      which counts in milliseconds ...
      
      So the delay for legacy systems is off by a factor of 1,000, so instead
      of 10 msecs we waited for 10 seconds ...
      
      The reason bisection pointed to 853b160a was that 853b160a removed
      a (broken) boot-time speedup patch, which masked the factor 1,000 bug.
      
      Fix it by using udelay(). This fixes my bootup problems.
      
      Cc: Len Brown <len.brown@intel.com>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan H. Schönherr <jschoenh@amazon.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7cb68598
  2. 17 5月, 2015 2 次提交
    • D
      x86/asm/entry/64: Use shorter MOVs from segment registers · adeb5537
      Denys Vlasenko 提交于
      The "movw %ds,%cx" instruction needs a 0x66 prefix, while
      "movl %ds,%ecx" does not.
      
      The difference is that latter form (on 64-bit CPUs)
      overwrites the entire %ecx, not only its lower half.
      
      But subsequent code doesn't depend on the value of upper
      half of %ecx, so we can safely use the shorter instruction.
      
      The new code is also faster than the old one - now we don't
      depend on the old value of %ecx, but this code fragment is
      not performance-critical so it does not matter much.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1431722346-26585-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      adeb5537
    • B
      x86/asm/head*.S: Change global labels to local · e839004b
      Borislav Petkov 提交于
      Make the disassembly look less confusing:
      
        -- head_64.o.before.asm
        ++ head_64.o.after.asm
         0000000000000120 <early_idt_handler>:
          120:	fc                   	cld
          121:	83 3c 24 02          	cmpl   $0x2,(%rsp)
        - 125:	0f 84 9d 00 00 00    	je     1c8 <is_nmi>
        + 125:	0f 84 9d 00 00 00    	je     1c8 <early_idt_handler+0xa8>
          12b:	83 3d 00 00 00 00 02 	cmpl   $0x2,0x0(%rip)        # 132 <early_idt_handler+0x12>
          132:	74 7e                	je     1b2 <early_idt_handler+0x92>
          134:	ff 05 00 00 00 00    	incl   0x0(%rip)        # 13a <early_idt_handler+0x1a>
        @@ -1198,9 +1198,7 @@ Disassembly of section .init.text:
          1bf:	5a                   	pop    %rdx
          1c0:	59                   	pop    %rcx
          1c1:	58                   	pop    %rax
        - 1c2:	ff 0d 00 00 00 00    	decl   0x0(%rip)        # 1c8 <is_nmi>
        -
        -00000000000001c8 <is_nmi>:
        + 1c2:	ff 0d 00 00 00 00    	decl   0x0(%rip)        # 1c8 <early_idt_handler+0xa8>
          1c8:	48 83 c4 10          	add    $0x10,%rsp
          1cc:	48 cf                	iretq
      
        -- head_32.o.before.asm
        ++ head_32.o.after.asm
         0000016c <early_idt_handler>:
          16c:  fc                      cld
          16d:  83 3c 24 02             cmpl   $0x2,(%esp)
        - 171:  74 73                   je     1e6 <is_nmi>
        + 171:  74 73                   je     1e6 <ex_entry+0xc>
          173:  36 83 3d 00 00 00 00    cmpl   $0x2,%ss:0x0
          17a:  02
          17b:  74 5a                   je     1d7 <hlt_loop>
        @@ -483,8 +483,6 @@ Disassembly of section .init.text:
          1dd:  59                      pop    %ecx
          1de:  58                      pop    %eax
          1df:  36 ff 0d 00 00 00 00    decl   %ss:0x0
        -
        -000001e6 <is_nmi>:
          1e6:  83 c4 08                add    $0x8,%esp
          1e9:  cf                      iret
          1ea:  66 90                   xchg   %ax,%ax
      
      No functionality change.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1431793079-11153-1-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e839004b
  3. 15 5月, 2015 2 次提交
  4. 13 5月, 2015 3 次提交
    • J
      x86, irq: Allocate CPU vectors from device local CPUs if possible · 486ca539
      Jiang Liu 提交于
      On NUMA systems, an IO device may be associated with a NUMA node.
      It may improve IO performance to allocate resources, such as memory
      and interrupts, from device local node.
      
      This patch introduces a mechanism to support CPU vector allocation
      policies. It tries to allocate CPU vectors from CPUs on device local
      node first, and then fallback to all online(global) CPUs.
      
      This mechanism may be used to support NumaConnect systems to allocate
      CPU vectors from device local node.
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Tested-by: NDaniel J Blueman <daniel@numascale.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: http://lkml.kernel.org/r/1430967244-28905-1-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      486ca539
    • S
      x86/hpet: Pass proper pointer to irq_alloc_info · 4a00c95d
      Sergey Senozhatsky 提交于
      Fix the following oops:
       hpet_msi_get_hwirq+0x1f/0x27
       msi_domain_alloc+0x35/0xfe
       ? trace_hardirqs_on_caller+0x16c/0x188
       irq_domain_alloc_irqs_recursive+0x51/0x95
       __irq_domain_alloc_irqs+0x151/0x223
       hpet_assign_irq+0x5d/0x68
       hpet_msi_capability_lookup+0x121/0x1cb
       ? hpet_enable+0x2b4/0x2b4
       hpet_late_init+0x5f/0xf2
       ? hpet_enable+0x2b4/0x2b4
       do_one_initcall+0x184/0x199
       kernel_init_freeable+0x1af/0x237
       ? rest_init+0x13a/0x13a
       kernel_init+0xe/0xd4
       ret_from_fork+0x3f/0x70
       ? rest_init+0x13a/0x13a
      
      Since 3cb96f0c ('x86/hpet: Enhance HPET IRQ to support
      hierarchical irqdomains') hpet_msi_capability_lookup() uses
      hpet_assign_irq(). The latter initializes irq_alloc_info on stack, but
      passes a NULL pointer to irq_domain_alloc_irqs(), which causes a NULL
      pointer dereference later in hpet_msi_get_hwirq().
      
      Pass the pointer to the irq_alloc_info irq_domain_alloc_irqs().
      
      Fixes: 3cb96f0c 'x86/hpet: Enhance HPET IRQ to support hierarchical irqdomains'
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NJiang Liu <jiang.liu@linux.intel.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Link: http://lkml.kernel.org/r/20150512041444.GA1094@swordfishSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      4a00c95d
    • I
      Revert f5d6a52f ("x86/smpboot: Skip delays during SMP initialization similar to Xen") · 853b160a
      Ingo Molnar 提交于
      Huang Ying reported x86 boot hangs due to this commit.
      
      Turns out that the change, despite its changelog, does more
      than just change timeouts: it also changes the way we
      assert/deassert INIT via the APIC_DM_INIT IPI, in the x2apic
      case it skips the deassert step.
      
      This is historically fragile code and the patch did not
      improve it, so revert these changes.
      
      This commit:
      
        1a744cb3 ("x86/smp/boot: Remove 10ms delay from cpu_up() on modern processors")
      
      independently removes the worst of the delays (the 10 msec delay).
      
      The remaining delays can be addressed one by one, combined
      with careful testing.
      Reported-by: NHuang Ying <ying.huang@intel.com>
      Cc: Anthony Liguori <aliguori@amazon.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Gang Wei <gang.wei@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan H. Schönherr <jschoenh@amazon.de>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Deegan <tim@xen.org>
      Link: http://lkml.kernel.org/r/1430732554-7294-1-git-send-email-jschoenh@amazon.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      853b160a
  5. 12 5月, 2015 2 次提交
  6. 11 5月, 2015 2 次提交
    • S
      perf/x86/rapl: Enable Broadwell-U RAPL support · 44b11fee
      Stephane Eranian 提交于
      This patch enables RAPL counters (energy consumption counters)
      support for Intel Broadwell-U processors (Model 61):
      
      To use:
      
        $ perf stat -a -I 1000 -e power/energy-cores/,power/energy-pkg/,power/energy-ram/ sleep 10
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: jacob.jun.pan@linux.intel.com
      Cc: kan.liang@intel.com
      Cc: peterz@infradead.org
      Cc: sonnyrao@chromium.org
      Link: http://lkml.kernel.org/r/20150423070709.GA4970@thinkpadSigned-off-by: NIngo Molnar <mingo@kernel.org>
      44b11fee
    • B
      x86/alternatives: Switch AMD F15h and later to the P6 NOPs · f21262b8
      Borislav Petkov 提交于
      Software optimization guides for both F15h and F16h cite those
      NOPs as the optimal ones. A microbenchmark confirms that
      actually even older families are better with the single-insn
      NOPs so switch to them for the alternatives.
      
      Cycles count below includes the loop overhead of the measurement
      but that overhead is the same with all runs.
      
      	F10h, revE:
      	-----------
      	Running NOP tests, 1000 NOPs x 1000000 repetitions
      
      	K8:
      			      90     288.212282 cycles
      			   66 90     288.220840 cycles
      			66 66 90     288.219447 cycles
      		     66 66 66 90     288.223204 cycles
      		  66 66 90 66 90     571.393424 cycles
      	       66 66 90 66 66 90     571.374919 cycles
      	    66 66 66 90 66 66 90     572.249281 cycles
      	 66 66 66 90 66 66 66 90     571.388651 cycles
      
      	P6:
      			      90     288.214193 cycles
      			   66 90     288.225550 cycles
      			0f 1f 00     288.224441 cycles
      		     0f 1f 40 00     288.225030 cycles
      		  0f 1f 44 00 00     288.233558 cycles
      	       66 0f 1f 44 00 00     324.792342 cycles
      	    0f 1f 80 00 00 00 00     325.657462 cycles
      	 0f 1f 84 00 00 00 00 00     430.246643 cycles
      
      	F14h:
      	----
      	Running NOP tests, 1000 NOPs x 1000000 repetitions
      
      	K8:
      			      90     510.404890 cycles
      			   66 90     510.432117 cycles
      			66 66 90     510.561858 cycles
      		     66 66 66 90     510.541865 cycles
      		  66 66 90 66 90    1014.192782 cycles
      	       66 66 90 66 66 90    1014.226546 cycles
      	    66 66 66 90 66 66 90    1014.334299 cycles
      	 66 66 66 90 66 66 66 90    1014.381205 cycles
      
      	P6:
      			      90     510.436710 cycles
      			   66 90     510.448229 cycles
      			0f 1f 00     510.545100 cycles
      		     0f 1f 40 00     510.502792 cycles
      		  0f 1f 44 00 00     510.589517 cycles
      	       66 0f 1f 44 00 00     510.611462 cycles
      	    0f 1f 80 00 00 00 00     511.166794 cycles
      	 0f 1f 84 00 00 00 00 00     511.651641 cycles
      
      	F15h:
      	-----
      	Running NOP tests, 1000 NOPs x 1000000 repetitions
      
      	K8:
      			      90     243.128396 cycles
      			   66 90     243.129883 cycles
      			66 66 90     243.131631 cycles
      		     66 66 66 90     242.499324 cycles
      		  66 66 90 66 90     481.829083 cycles
      	       66 66 90 66 66 90     481.884413 cycles
      	    66 66 66 90 66 66 90     481.851446 cycles
      	 66 66 66 90 66 66 66 90     481.409220 cycles
      
      	P6:
      			      90     243.127026 cycles
      			   66 90     243.130711 cycles
      			0f 1f 00     243.122747 cycles
      		     0f 1f 40 00     242.497617 cycles
      		  0f 1f 44 00 00     245.354461 cycles
      	       66 0f 1f 44 00 00     361.930417 cycles
      	    0f 1f 80 00 00 00 00     362.844944 cycles
      	 0f 1f 84 00 00 00 00 00     480.514948 cycles
      
      	F16h:
      	-----
      	Running NOP tests, 1000 NOPs x 1000000 repetitions
      
      	K8:
      			      90     507.793298 cycles
      			   66 90     507.789636 cycles
      			66 66 90     507.826490 cycles
      		     66 66 66 90     507.859075 cycles
      		  66 66 90 66 90    1008.663129 cycles
      	       66 66 90 66 66 90    1008.696259 cycles
      	    66 66 66 90 66 66 90    1008.692517 cycles
      	 66 66 66 90 66 66 66 90    1008.755399 cycles
      
      	P6:
      			      90     507.795232 cycles
      			   66 90     507.794761 cycles
      			0f 1f 00     507.834901 cycles
      		     0f 1f 40 00     507.822629 cycles
      		  0f 1f 44 00 00     507.838493 cycles
      	       66 0f 1f 44 00 00     507.908597 cycles
      	    0f 1f 80 00 00 00 00     507.946417 cycles
      	 0f 1f 84 00 00 00 00 00     507.954960 cycles
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1431332153-18566-2-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f21262b8
  7. 10 5月, 2015 3 次提交
  8. 08 5月, 2015 6 次提交
  9. 06 5月, 2015 4 次提交
  10. 05 5月, 2015 2 次提交
  11. 27 4月, 2015 2 次提交
  12. 24 4月, 2015 11 次提交