1. 04 8月, 2015 19 次提交
  2. 31 7月, 2015 3 次提交
  3. 26 7月, 2015 1 次提交
    • M
      perf/x86/intel/cqm: Return cached counter value from IRQ context · 2c534c0d
      Matt Fleming 提交于
      Peter reported the following potential crash which I was able to
      reproduce with his test program,
      
      [  148.765788] ------------[ cut here ]------------
      [  148.765796] WARNING: CPU: 34 PID: 2840 at kernel/smp.c:417 smp_call_function_many+0xb6/0x260()
      [  148.765797] Modules linked in:
      [  148.765800] CPU: 34 PID: 2840 Comm: perf Not tainted 4.2.0-rc1+ #4
      [  148.765803]  ffffffff81cdc398 ffff88085f105950 ffffffff818bdfd5 0000000000000007
      [  148.765805]  0000000000000000 ffff88085f105990 ffffffff810e413a 0000000000000000
      [  148.765807]  ffffffff82301080 0000000000000022 ffffffff8107f640 ffffffff8107f640
      [  148.765809] Call Trace:
      [  148.765810]  <NMI>  [<ffffffff818bdfd5>] dump_stack+0x45/0x57
      [  148.765818]  [<ffffffff810e413a>] warn_slowpath_common+0x8a/0xc0
      [  148.765822]  [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60
      [  148.765824]  [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60
      [  148.765825]  [<ffffffff810e422a>] warn_slowpath_null+0x1a/0x20
      [  148.765827]  [<ffffffff811613f6>] smp_call_function_many+0xb6/0x260
      [  148.765829]  [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60
      [  148.765831]  [<ffffffff81161748>] on_each_cpu_mask+0x28/0x60
      [  148.765832]  [<ffffffff8107f6ef>] intel_cqm_event_count+0x7f/0xe0
      [  148.765836]  [<ffffffff811cdd35>] perf_output_read+0x2a5/0x400
      [  148.765839]  [<ffffffff811d2e5a>] perf_output_sample+0x31a/0x590
      [  148.765840]  [<ffffffff811d333d>] ? perf_prepare_sample+0x26d/0x380
      [  148.765841]  [<ffffffff811d3497>] perf_event_output+0x47/0x60
      [  148.765843]  [<ffffffff811d36c5>] __perf_event_overflow+0x215/0x240
      [  148.765844]  [<ffffffff811d4124>] perf_event_overflow+0x14/0x20
      [  148.765847]  [<ffffffff8107e7f4>] intel_pmu_handle_irq+0x1d4/0x440
      [  148.765849]  [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
      [  148.765853]  [<ffffffff81219bad>] ? vunmap_page_range+0x19d/0x2f0
      [  148.765854]  [<ffffffff81219d11>] ? unmap_kernel_range_noflush+0x11/0x20
      [  148.765859]  [<ffffffff814ce6fe>] ? ghes_copy_tofrom_phys+0x11e/0x2a0
      [  148.765863]  [<ffffffff8109e5db>] ? native_apic_msr_write+0x2b/0x30
      [  148.765865]  [<ffffffff8109e44d>] ? x2apic_send_IPI_self+0x1d/0x20
      [  148.765869]  [<ffffffff81065135>] ? arch_irq_work_raise+0x35/0x40
      [  148.765872]  [<ffffffff811c8d86>] ? irq_work_queue+0x66/0x80
      [  148.765875]  [<ffffffff81075306>] perf_event_nmi_handler+0x26/0x40
      [  148.765877]  [<ffffffff81063ed9>] nmi_handle+0x79/0x100
      [  148.765879]  [<ffffffff81064422>] default_do_nmi+0x42/0x100
      [  148.765880]  [<ffffffff81064563>] do_nmi+0x83/0xb0
      [  148.765884]  [<ffffffff818c7c0f>] end_repeat_nmi+0x1e/0x2e
      [  148.765886]  [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
      [  148.765888]  [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
      [  148.765890]  [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
      [  148.765891]  <<EOE>>  [<ffffffff8110ab66>] finish_task_switch+0x156/0x210
      [  148.765898]  [<ffffffff818c1671>] __schedule+0x341/0x920
      [  148.765899]  [<ffffffff818c1c87>] schedule+0x37/0x80
      [  148.765903]  [<ffffffff810ae1af>] ? do_page_fault+0x2f/0x80
      [  148.765905]  [<ffffffff818c1f4a>] schedule_user+0x1a/0x50
      [  148.765907]  [<ffffffff818c666c>] retint_careful+0x14/0x32
      [  148.765908] ---[ end trace e33ff2be78e14901 ]---
      
      The CQM task events are not safe to be called from within interrupt
      context because they require performing an IPI to read the counter value
      on all sockets. And performing IPIs from within IRQ context is a
      "no-no".
      
      Make do with the last read counter value currently event in
      event->count when we're invoked in this context.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vikas Shivappa <vikas.shivappa@intel.com>
      Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
      Cc: Will Auld <will.auld@intel.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1437490509-15373-1-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      2c534c0d
  4. 21 7月, 2015 1 次提交
  5. 18 7月, 2015 2 次提交
  6. 17 7月, 2015 2 次提交
  7. 15 7月, 2015 1 次提交
    • T
      genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for now · ce0d3c0a
      Thomas Gleixner 提交于
      Boris reported that the sparse_irq protection around __cpu_up() in the
      generic code causes a regression on Xen. Xen allocates interrupts and
      some more in the xen_cpu_up() function, so it deadlocks on the
      sparse_irq_lock.
      
      There is no simple fix for this and we really should have the
      protection for all architectures, but for now the only solution is to
      move it to x86 where actual wreckage due to the lack of protection has
      been observed.
      Reported-and-tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Fixes: a8994181 'hotplug: Prevent alloc/free of irq descriptors during cpu up/down'
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: xiao jin <jin.xiao@intel.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
      Cc: xen-devel <xen-devel@lists.xenproject.org>
      ce0d3c0a
  8. 07 7月, 2015 3 次提交
    • T
      x86/irq: Retrieve irq data after locking irq_desc · 09cf92b7
      Thomas Gleixner 提交于
      irq_data is protected by irq_desc->lock, so retrieving the irq chip
      from irq_data outside the lock is racy vs. an concurrent update. Move
      it into the lock held region.
      
      While at it add a comment why the vector walk does not require
      vector_lock.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: xiao jin <jin.xiao@intel.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
      Link: http://lkml.kernel.org/r/20150705171102.331320612@linutronix.de
      09cf92b7
    • T
      x86/irq: Use proper locking in check_irq_vectors_for_cpu_disable() · cbb24dc7
      Thomas Gleixner 提交于
      It's unsafe to examine fields in the irq descriptor w/o holding the
      descriptor lock. Add proper locking.
      
      While at it add a comment why the vector check can run lock less
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: xiao jin <jin.xiao@intel.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
      Link: http://lkml.kernel.org/r/20150705171102.236544164@linutronix.de
      cbb24dc7
    • T
      x86/irq: Plug irq vector hotplug race · 5a3f75e3
      Thomas Gleixner 提交于
      Jin debugged a nasty cpu hotplug race which results in leaking a irq
      vector on the newly hotplugged cpu.
      
      cpu N				cpu M
      native_cpu_up                   device_shutdown
        do_boot_cpu			  free_msi_irqs
        start_secondary                   arch_teardown_msi_irqs
          smp_callin                        default_teardown_msi_irqs
             setup_vector_irq                  arch_teardown_msi_irq
              __setup_vector_irq		   native_teardown_msi_irq
                lock(vector_lock)		     destroy_irq 
                install vectors
                unlock(vector_lock)
      					       lock(vector_lock)
      --->                                  	       __clear_irq_vector
                                          	       unlock(vector_lock)
          lock(vector_lock)
          set_cpu_online
          unlock(vector_lock)
      
      This leaves the irq vector(s) which are torn down on CPU M stale in
      the vector array of CPU N, because CPU M does not see CPU N online
      yet. There is a similar issue with concurrent newly setup interrupts.
      
      The alloc/free protection of irq descriptors does not prevent the
      above race, because it merily prevents interrupt descriptors from
      going away or changing concurrently.
      
      Prevent this by moving the call to setup_vector_irq() into the
      vector_lock held region which protects set_cpu_online():
      
      cpu N				cpu M
      native_cpu_up                   device_shutdown
        do_boot_cpu			  free_msi_irqs
        start_secondary                   arch_teardown_msi_irqs
          smp_callin                        default_teardown_msi_irqs
             lock(vector_lock)                arch_teardown_msi_irq
             setup_vector_irq()
              __setup_vector_irq		   native_teardown_msi_irq
                install vectors		     destroy_irq 
             set_cpu_online
             unlock(vector_lock)
      					       lock(vector_lock)
                                        	       __clear_irq_vector
                                          	       unlock(vector_lock)
      
      So cpu M either sees the cpu N online before clearing the vector or
      cpu N installs the vectors after cpu M has cleared it.
      Reported-by: Nxiao jin <jin.xiao@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
      Link: http://lkml.kernel.org/r/20150705171102.141898931@linutronix.de
      5a3f75e3
  9. 06 7月, 2015 6 次提交
    • S
      x86/earlyprintk: Allow early_printk() to use console style parameters like '115200n8' · 827a82ff
      Steven Rostedt 提交于
      When I enable early_printk on a kernel, I cut and paste the
      console= input and add to earlyprintk parameter. But I notice
      recently that ktest has not been detecting triple faults. The
      way it detects it, is by seeing the kernel banner "Linux version
      .." with a different kernel version pop up. Then I noticed that
      early printk was no longer working on my console, which was why
      ktest was not seeing it.
      
      I bisected it down and it was added to 4.0 with this commit:
      
        ea9e9d80 ("Specify PCI based UART for earlyprintk")
      
      because it converted the simple_strtoul() that converts the baud
      number into a kstrtoul(). The problem with this is, I had as my
      baud rate, 115200n8 (acceptable for console=ttyS0), but because
      of the "n8", the kstrtoul() doesn't parse the baud rate and
      returns an error, which sets the baud rate to the default 9600.
      This explains the garbage on my screen.
      
      Now, earlyprintk= kernel parameter does not say it accepts that
      format. Thus, one answer would simply be me changing my kernel
      parameters to remove the "n8" since it isn't parsed anyway. But
      I wonder if other people run into this, and it seems strange
      that the two consoles for serial accepts different input.
      
      I could also extend this to have earlyprintk do something with
      that "n8" or whatever it has and have it match the console
      parsing (which, BTW, still uses simple_strtoul(), as I guess it
      has to).
      
      This patch just makes my old kernel parameter parsing work like
      it use to.
      
      Although, simple_strtoull() is considered obsolete, it is the
      only standard string parsing function that parses a number that
      is attached to text. Ironically, commit ea9e9d80 also added
      several calls to simple_strtoul()!
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Cohen <david.a.cohen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stuart R. Anderson <stuart.r.anderson@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150706101434.5f6a351b@gandalf.local.home
      [ Cleaned it up a bit. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      827a82ff
    • Z
      x86/espfix: Init espfix on the boot CPU side · 20d5e4a9
      Zhu Guihua 提交于
      As we alloc pages with GFP_KERNEL in init_espfix_ap() which is
      called before we enable local irqs, so the lockdep sub-system
      would (correctly) trigger a warning about the potentially
      blocking API.
      
      So we allocate them on the boot CPU side when the secondary CPU is
      brought up by the boot CPU, and hand them over to the secondary
      CPU.
      
      And we use alloc_pages_node() with the secondary CPU's node, to
      make sure the espfix stack is NUMA-local to the CPU that is
      going to use it.
      Signed-off-by: NZhu Guihua <zhugh.fnst@cn.fujitsu.com>
      Cc: <bp@alien8.de>
      Cc: <luto@amacapital.net>
      Cc: <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/c97add2670e9abebb90095369f0cfc172373ac94.1435824469.git.zhugh.fnst@cn.fujitsu.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      20d5e4a9
    • Z
      x86/espfix: Add 'cpu' parameter to init_espfix_ap() · 1db87563
      Zhu Guihua 提交于
      Add a CPU index parameter to init_espfix_ap(), so that the
      parameter could be propagated to the function for espfix
      page allocation.
      Signed-off-by: NZhu Guihua <zhugh.fnst@cn.fujitsu.com>
      Cc: <bp@alien8.de>
      Cc: <luto@amacapital.net>
      Cc: <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/cde3fcf1b3211f3f03feb1a995bce3fee850f0fc.1435824469.git.zhugh.fnst@cn.fujitsu.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1db87563
    • A
      x86/kasan: Fix KASAN shadow region page tables · 5d5aa3cf
      Alexander Popov 提交于
      Currently KASAN shadow region page tables created without
      respect of physical offset (phys_base). This causes kernel halt
      when phys_base is not zero.
      
      So let's initialize KASAN shadow region page tables in
      kasan_early_init() using __pa_nodebug() which considers
      phys_base.
      
      This patch also separates x86_64_start_kernel() from KASAN low
      level details by moving kasan_map_early_shadow(init_level4_pgt)
      into kasan_early_init().
      
      Remove the comment before clear_bss() which stopped bringing
      much profit to the code readability. Otherwise describing all
      the new order dependencies would be too verbose.
      Signed-off-by: NAlexander Popov <alpopov@ptsecurity.com>
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: <stable@vger.kernel.org> # 4.0+
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <adech.fo@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1435828178-10975-3-git-send-email-a.ryabinin@samsung.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5d5aa3cf
    • A
      x86/init: Clear 'init_level4_pgt' earlier · d0f77d4d
      Andrey Ryabinin 提交于
      Currently x86_64_start_kernel() has two KASAN related
      function calls. The first call maps shadow to early_level4_pgt,
      the second maps shadow to init_level4_pgt.
      
      If we move clear_page(init_level4_pgt) earlier, we could hide
      KASAN low level detail from generic x86_64 initialization code.
      The next patch will do it.
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: <stable@vger.kernel.org> # 4.0+
      Cc: Alexander Popov <alpopov@ptsecurity.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <adech.fo@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1435828178-10975-2-git-send-email-a.ryabinin@samsung.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d0f77d4d
    • A
      x86/tsc: Let high latency PIT fail fast in quick_pit_calibrate() · 5aac644a
      Adrian Hunter 提交于
      If it takes longer than 12us to read the PIT counter lsb/msb,
      then the error margin will never fall below 500ppm within 50ms,
      and Fast TSC calibration will always fail.
      
      This patch detects when that will happen and fails fast. Note
      the failure message is not printed in that case because:
      1. it will always happen on that class of hardware
      2. the absence of the message is more informative than its
      presence
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Link: http://lkml.kernel.org/r/556EB717.9070607@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5aac644a
  10. 04 7月, 2015 1 次提交
    • I
      x86/fpu: Fix boot crash in the early FPU code · b96fecbf
      Ingo Molnar 提交于
      Jan Kara and Thomas Gleixner reported boot crashes in the FPU
      code:
      
        general protection fault: 0000 [#1] SMP
        RIP: 0010:[<ffffffff81048a6c>]  [<ffffffff81048a6c>] mxcsr_feature_mask_init+0x1c/0x40
      
        2b:*  0f ae 85 00 fe ff ff    fxsave -0x200(%rbp)
      
      and bisected it down to the following FPU commit:
      
         91a8c2a5 ("x86/fpu: Clean up and fix MXCSR handling")
      
      The reason is that the on-stack FPU registers state variable,
      used by the FXSAVE instruction, did not have the required
      minimum alignment of 16 bytes, causing the general protection
      fault.
      
      This is most likely a GCC bug in older GCC versions, but the
      offending commit also added a bogus extra 32-byte alignment
      (which GCC ignored too).
      
      So fix this bug by making the variable static again, but also
      mark it __initdata this time, because fpu__init_system_mxcsr()
      is now an __init function.
      Reported-and-bisected-by: NJan Kara <jack@suse.cz>
      Reported-bisected-and-tested-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150704075819.GA9201@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b96fecbf
  11. 01 7月, 2015 1 次提交