1. 01 11月, 2015 3 次提交
  2. 31 10月, 2015 1 次提交
  3. 27 10月, 2015 1 次提交
    • W
      x86/ioapic: Prevent NULL pointer dereference in setup_ioapic_dest() · ababae44
      Werner Pawlitschko 提交于
      Commit 4857c91f changed the way how irq affinity is setup in
      setup_ioapic_dest() from using the core helper function to
      unconditionally calling the irq_set_affinity() callback of the
      underlying irq chip.
      
      That results in a NULL pointer dereference for the rare case where the
      underlying irq chip is lapic_chip which has no irq_set_affinity()
      callback. lapic_chip is occasionally used for the timer interrupt (irq
      0).
      
      The fix is simple: Check the availability of the callback instead of
      calling it unconditionally.
      
      Fixes: 4857c91f "x86/ioapic: Force affinity setting in setup_ioapic_dest()"
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      ababae44
  4. 26 10月, 2015 1 次提交
  5. 21 10月, 2015 12 次提交
    • B
      x86/microcode/intel: Move #ifdef DEBUG inside the function · c595ac2b
      Borislav Petkov 提交于
      ... and save us the stub.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1445334889-300-6-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c595ac2b
    • B
      x86/microcode/amd: Remove maintainers from comments · 6f7fc44b
      Borislav Petkov 提交于
      We have the MAINTAINERS file for that. Also, Andreas doesn't
      have the time for this work anymore.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1445334889-300-5-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6f7fc44b
    • B
      x86/microcode: Remove modularization leftovers · 6b26e1bf
      Borislav Petkov 提交于
      Remove the remaining module functionality leftovers. Make
      "dis_ucode_ldr" an early_param and make it static again. Drop
      module aliases, autoloading table, description, etc.
      
      Bump version number, while at it.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1445334889-300-4-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6b26e1bf
    • B
      x86/microcode: Merge the early microcode loader · fe055896
      Borislav Petkov 提交于
      Merge the early loader functionality into the driver proper. The
      diff is huge but logically, it is simply moving code from the
      _early.c files into the main driver.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1445334889-300-3-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fe055896
    • B
      x86/microcode: Unmodularize the microcode driver · 9a2bc335
      Borislav Petkov 提交于
      Make CONFIG_MICROCODE a bool. It was practically a bool already anyway,
      since early loader was forcing it to =y.
      
      Regardless, there's no real reason to have something be a module which
      gets built-in on the majority of installations out there. And its not
      like there's noticeable change in functionality - we still can load late
      microcode - just the module glue disappears.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1445334889-300-2-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9a2bc335
    • J
      timers/x86/hpet: Type adjustments · 3d45ac4b
      Jan Beulich 提交于
      Standardize on bool instead of an inconsistent mixture of u8 and plain 'int'.
      
      Also use u32 or 'unsigned int' instead of 'unsigned long' when a 32-bit type
      suffices, generating slightly better code on x86-64.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/5624E3A002000078000AC49A@prv-mh.provo.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3d45ac4b
    • A
      x86/mce: Fix thermal throttling reporting after kexec · 81ffdcdd
      Andi Kleen 提交于
      The per CPU thermal vector init code checks if the thermal
      vector is already installed and complains and bails out if it
      is.
      
      This happens after kexec, as kernel shut down does not clear the
      thermal vector APIC register.
      
      This causes two problems:
      
      1. So we always do not fully initialize thermal reports after
         kexec. The CPU is still likely initialized, as the previous
         kernel should have done it. But we don't set up the software
         pointer to the thermal vector, so reporting may end up with a
         unknown thermal interrupt message.
      
      2. Also it complains for every logical CPU, even though the
         value is actually derived from BP only.
      
      The problem is that we end up with one message per CPU, so on
      larger systems it becomes very noisy and messes up the otherwise
      nicely formatted CPU bootup numbers in the kernel log.
      
      Just remove the check. I checked the code and there's no valid
      code paths where the thermal init code for a CPU could be called
      multiple times.
      
      Why the kernel does not clean up this value on shutdown:
      
      The thermal monitoring is controlled per logical CPU thread.
      Normal shutdown code is just running on one CPU. To disable it
      we would need a broadcast NMI to all CPUs on shut down. That's
      overkill for this. So we just ignore it after kexec.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1445246268-26285-9-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      81ffdcdd
    • B
      x86/setup/crash: Check memblock_reserve() retval · 6f376057
      Borislav Petkov 提交于
      memblock_reserve() can fail but the crashkernel reservation code
      doesn't check that and this can lead the user into believing
      that the crashkernel region was actually reserved. Make sure we
      check that return value and we exit early with a failure message
      in the error case.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Young <dyoung@redhat.com>
      Reviewed-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: jerry_hoemann@hp.com
      Link: http://lkml.kernel.org/r/1445246268-26285-7-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6f376057
    • B
      x86/setup/crash: Cleanup some more · f56d5578
      Borislav Petkov 提交于
      * Remove unused auto_set variable
      * Cleanup local function variable declarations
      * Reformat printk string and use pr_info()
      
      No functionality change.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Young <dyoung@redhat.com>
      Reviewed-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: jerry_hoemann@hp.com
      Link: http://lkml.kernel.org/r/1445246268-26285-6-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f56d5578
    • B
      x86/setup/crash: Remove alignment variable · 606134f7
      Borislav Petkov 提交于
      Use a macro instead. No functionality change.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Young <dyoung@redhat.com>
      Reviewed-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: jerry_hoemann@hp.com
      Link: http://lkml.kernel.org/r/1445246268-26285-5-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      606134f7
    • B
      x86/setup: Cleanup crashkernel reservation functions · 97eac21b
      Borislav Petkov 提交于
      * Shorten variable names
      * Realign code, space out for better readability
      
      No code changed:
      
        # arch/x86/kernel/setup.o:
      
         text    data     bss     dec     hex filename
         4543    3096   69904   77543   12ee7 setup.o.before
         4543    3096   69904   77543   12ee7 setup.o.after
      
      md5:
         8a1b7c6738a553ca207b56bd84a8f359  setup.o.before.asm
         8a1b7c6738a553ca207b56bd84a8f359  setup.o.after.asm
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Young <dyoung@redhat.com>
      Reviewed-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: jerry_hoemann@hp.com
      Link: http://lkml.kernel.org/r/1445246268-26285-4-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      97eac21b
    • B
      x86/setup: Do not reserve crashkernel high memory if low reservation failed · eb6db83d
      Baoquan He 提交于
      People reported that when allocating crashkernel memory using
      the ",high" and ",low" syntax, there were cases where the
      reservation of the high portion succeeds but the reservation of
      the low portion fails.
      
      Then kexec can load the kdump kernel successfully, but booting
      the kdump kernel fails as there's no low memory.
      
      The low memory allocation for the kdump kernel can fail on large
      systems for a couple of reasons. For example, the manually
      specified crashkernel low memory can be too large and thus no
      adequate memblock region would be found.
      
      Therefore, we try to reserve low memory for the crash kernel
      *after* the high memory portion has been allocated. If that
      fails, we free crashkernel high memory too and return. The user
      can then take measures accordingly.
      Tested-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      [ Massage text. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: jerry_hoemann@hp.com
      Cc: yinghai@kernel.org
      Link: http://lkml.kernel.org/r/1445246268-26285-2-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      eb6db83d
  6. 20 10月, 2015 3 次提交
  7. 19 10月, 2015 3 次提交
  8. 16 10月, 2015 2 次提交
    • V
      x86/ioapic: Disable interrupts when re-routing legacy IRQs · c0ff971e
      Vitaly Kuznetsov 提交于
      A sporadic hang with consequent crash is observed when booting Hyper-V Gen1
      guests:
      
       Call Trace:
        <IRQ>
        [<ffffffff810ab68d>] ? trace_hardirqs_off+0xd/0x10
        [<ffffffff8107b616>] queue_work_on+0x46/0x90
        [<ffffffff81365696>] ? add_interrupt_randomness+0x176/0x1d0
        ...
        <EOI>
        [<ffffffff81471ddb>] ? _raw_spin_unlock_irqrestore+0x3b/0x60
        [<ffffffff810c295e>] __irq_put_desc_unlock+0x1e/0x40
        [<ffffffff810c5c35>] irq_modify_status+0xb5/0xd0
        [<ffffffff8104adbb>] mp_register_handler+0x4b/0x70
        [<ffffffff8104c55a>] mp_irqdomain_alloc+0x1ea/0x2a0
        [<ffffffff810c7f10>] irq_domain_alloc_irqs_recursive+0x40/0xa0
        [<ffffffff810c860c>] __irq_domain_alloc_irqs+0x13c/0x2b0
        [<ffffffff8104b070>] alloc_isa_irq_from_domain.isra.1+0xc0/0xe0
        [<ffffffff8104bfa5>] mp_map_pin_to_irq+0x165/0x2d0
        [<ffffffff8104c157>] pin_2_irq+0x47/0x80
        [<ffffffff81744253>] setup_IO_APIC+0xfe/0x802
        ...
        [<ffffffff814631c0>] ? rest_init+0x140/0x140
      
      The issue is easily reproducible with a simple instrumentation: if
      mdelay(10) is put between mp_setup_entry() and mp_register_handler() calls
      in mp_irqdomain_alloc() Hyper-V guest always fails to boot when re-routing
      IRQ0. The issue seems to be caused by the fact that we don't disable
      interrupts while doing IOPIC programming for legacy IRQs and IRQ0 actually
      happens. 
      
      Protect the setup sequence against concurrent interrupts.
      
      [ tglx: Make the protection unconditional and not only for legacy
        	interrupts ]
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Link: http://lkml.kernel.org/r/1444930943-19336-1-git-send-email-vkuznets@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      c0ff971e
    • P
      x86/setup: Extend low identity map to cover whole kernel range · f5f3497c
      Paolo Bonzini 提交于
      On 32-bit systems, the initial_page_table is reused by
      efi_call_phys_prolog as an identity map to call
      SetVirtualAddressMap.  efi_call_phys_prolog takes care of
      converting the current CPU's GDT to a physical address too.
      
      For PAE kernels the identity mapping is achieved by aliasing the
      first PDPE for the kernel memory mapping into the first PDPE
      of initial_page_table.  This makes the EFI stub's trick "just work".
      
      However, for non-PAE kernels there is no guarantee that the identity
      mapping in the initial_page_table extends as far as the GDT; in this
      case, accesses to the GDT will cause a page fault (which quickly becomes
      a triple fault).  Fix this by copying the kernel mappings from
      swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at
      identity mapping.
      
      For some reason, this is only reproducible with QEMU's dynamic translation
      mode, and not for example with KVM.  However, even under KVM one can clearly
      see that the page table is bogus:
      
          $ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize
          $ gdb
          (gdb) target remote localhost:1234
          (gdb) hb *0x02858f6f
          Hardware assisted breakpoint 1 at 0x2858f6f
          (gdb) c
          Continuing.
      
          Breakpoint 1, 0x02858f6f in ?? ()
          (gdb) monitor info registers
          ...
          GDT=     0724e000 000000ff
          IDT=     fffbb000 000007ff
          CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=00000690
          ...
      
      The page directory is sane:
      
          (gdb) x/4wx 0x32b7000
          0x32b7000:	0x03398063	0x03399063	0x0339a063	0x0339b063
          (gdb) x/4wx 0x3398000
          0x3398000:	0x00000163	0x00001163	0x00002163	0x00003163
          (gdb) x/4wx 0x3399000
          0x3399000:	0x00400003	0x00401003	0x00402003	0x00403003
      
      but our particular page directory entry is empty:
      
          (gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) * 4
          0x32b7070:	0x00000000
      
      [ It appears that you can skate past this issue if you don't receive
        any interrupts while the bogus GDT pointer is loaded, or if you avoid
        reloading the segment registers in general.
      
        Andy Lutomirski provides some additional insight:
      
         "AFAICT it's entirely permissible for the GDTR and/or LDT
          descriptor to point to unmapped memory.  Any attempt to use them
          (segment loads, interrupts, IRET, etc) will try to access that memory
          as if the access came from CPL 0 and, if the access fails, will
          generate a valid page fault with CR2 pointing into the GDT or
          LDT."
      
        Up until commit 23a0d4e8 ("efi: Disable interrupts around EFI
        calls, not in the epilog/prolog calls") interrupts were disabled
        around the prolog and epilog calls, and the functional GDT was
        re-installed before interrupts were re-enabled.
      
        Which explains why no one has hit this issue until now. ]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reported-by: NLaszlo Ersek <lersek@redhat.com>
      Cc: <stable@vger.kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      [ Updated changelog. ]
      f5f3497c
  9. 14 10月, 2015 1 次提交
  10. 12 10月, 2015 5 次提交
  11. 09 10月, 2015 1 次提交
  12. 07 10月, 2015 1 次提交
  13. 06 10月, 2015 3 次提交
    • T
      perf/x86/intel/uncore: Fix multi-segment problem of perf_event_intel_uncore · 712df65c
      Taku Izumi 提交于
      In multi-segment system, uncore devices may belong to buses whose segment
      number is other than 0:
      
        ....
        0000:ff:10.5 System peripheral: Intel Corporation Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 03)
        ...
        0001:7f:10.5 System peripheral: Intel Corporation Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 03)
        ...
        0001:bf:10.5 System peripheral: Intel Corporation Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 03)
        ...
        0001:ff:10.5 System peripheral: Intel Corporation Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 03
        ...
      
      In that case, relation of bus number and physical id may be broken
      because "uncore_pcibus_to_physid" doesn't take account of PCI segment.
      For example, bus 0000:ff and 0001:ff uses the same entry of
      "uncore_pcibus_to_physid" array.
      
      This patch fixes this problem by introducing the segment-aware pci2phy_map instead.
      Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@kernel.org
      Cc: hpa@zytor.com
      Link: http://lkml.kernel.org/r/1443096621-4119-1-git-send-email-izumi.taku@jp.fujitsu.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      712df65c
    • K
      perf/x86: Add Intel cstate PMUs support · 7ce1346a
      Kan Liang 提交于
      This patch adds new PMUs to support cstate related free running
      (read-only) counters. These counters may be used simultaneously by other
      tools, such as turbostat. However, it still make sense to implement them
      in perf. Because we can conveniently collect them together with other
      events, and allow to use them from tools without special MSR access
      code.
      
      These counters include CORE_C*_RESIDENCY and PKG_C*_RESIDENCY.
      According to counters' scope and category, two PMUs are registered with
      the perf_event core subsystem.
      
       - 'cstate_core': The counter is available for each physical core. The
                        counters include CORE_C*_RESIDENCY.
      
       - 'cstate_pkg':  The counter is available for each physical package. The
                        counters include PKG_C*_RESIDENCY.
      
      The events are exposed in sysfs for use by perf stat and other tools.
      The files are:
      
        /sys/devices/cstate_core/events/c*-residency
        /sys/devices/cstate_pkg/events/c*-residency
      
      These events only support system-wide mode counting.
      The /sys/devices/cstate_*/cpumask file can be used by tools to figure
      out which CPUs to monitor by default.
      
      The PMU type (attr->type) is dynamically allocated and is available from
      /sys/devices/core_misc/type and /sys/device/cstate_*/type.
      
      Sampling is not supported.
      
      Here is an example.
      
       - To caculate the fraction of time when the core is running in C6 state
         CORE_C6_time% = CORE_C6_RESIDENCY / TSC
      
       # perf stat -x, -e"cstate_core/c6-residency/,msr/tsc/" -C0 -- taskset -c 0 sleep 5
      
         11838820015,,cstate_core/c6-residency/,5175919658,100.00
         11877130740,,msr/tsc/,5175922010,100.00
      
       For sleep, 99.7% of time we ran in C6 state.
      
       # perf stat -x, -e"cstate_core/c6-residency/,msr/tsc/" -C0 -- taskset -c 0 busyloop
      
         1253316,,cstate_core/c6-residency/,4360969154,100.00
         10012635248,,msr/tsc/,4360972366,100.00
      
       For busyloop, 0.01% of time we ran in C6 state.
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@kernel.org
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1443443404-8581-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7ce1346a
    • P
      sched/core, sched/x86: Kill thread_info::saved_preempt_count · d87b7a33
      Peter Zijlstra 提交于
      With the introduction of the context switch preempt_count invariant,
      and the demise of PREEMPT_ACTIVE, its pointless to save/restore the
      per-cpu preemption count, it must always be 2.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      d87b7a33
  14. 02 10月, 2015 1 次提交
    • L
      x86/kexec: Fix kexec crash in syscall kexec_file_load() · e3c41e37
      Lee, Chun-Yi 提交于
      The original bug is a page fault crash that sometimes happens
      on big machines when preparing ELF headers:
      
          BUG: unable to handle kernel paging request at ffffc90613fc9000
          IP: [<ffffffff8103d645>] prepare_elf64_ram_headers_callback+0x165/0x260
      
      The bug is caused by us under-counting the number of memory ranges
      and subsequently not allocating enough ELF header space for them.
      The bug is typically masked on smaller systems, because the ELF header
      allocation is rounded up to the next page.
      
      This patch modifies the code in fill_up_crash_elf_data() by using
      walk_system_ram_res() instead of walk_system_ram_range() to correctly
      count the max number of crash memory ranges. That's because the
      walk_system_ram_range() filters out small memory regions that
      reside in the same page, but walk_system_ram_res() does not.
      
      Here's how I found the bug:
      
      After tracing prepare_elf64_headers() and prepare_elf64_ram_headers_callback(),
      the code uses walk_system_ram_res() to fill-in crash memory regions information
      to the program header, so it counts those small memory regions that
      reside in a page area.
      
      But, when the kernel was using walk_system_ram_range() in
      fill_up_crash_elf_data() to count the number of crash memory regions,
      it filters out small regions.
      
      I printed those small memory regions, for example:
      
        kexec: Get nr_ram ranges. vaddr=0xffff880077592258 paddr=0x77592258, sz=0xdc0
      
      Based on the code in walk_system_ram_range(), this memory region
      will be filtered out:
      
        pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
        end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
        end_pfn - pfn = 0x77593 - 0x77593 = 0  <=== if (end_pfn > pfn) is FALSE
      
      So, the max_nr_ranges that's counted by the kernel doesn't include
      small memory regions - causing us to under-allocate the required space.
      That causes the page fault crash that happens in a later code path
      when preparing ELF headers.
      
      This bug is not easy to reproduce on small machines that have few
      CPUs, because the allocated page aligned ELF buffer has more free
      space to cover those small memory regions' PT_LOAD headers.
      Signed-off-by: NLee, Chun-Yi <jlee@suse.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: kexec@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1443531537-29436-1-git-send-email-jlee@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e3c41e37
  15. 01 10月, 2015 2 次提交