1. 07 11月, 2015 7 次提交
  2. 06 11月, 2015 1 次提交
    • J
      livepatch: Fix crash with !CONFIG_DEBUG_SET_MODULE_RONX · e2391a2d
      Josh Poimboeuf 提交于
      When loading a patch module on a kernel with
      !CONFIG_DEBUG_SET_MODULE_RONX, the following crash occurs:
      
        [  205.988776] livepatch: enabling patch 'kpatch_meminfo_string'
        [  205.989829] BUG: unable to handle kernel paging request at ffffffffa08d2fc0
        [  205.989863] IP: [<ffffffff8154fecb>] do_init_module+0x8c/0x1ba
        [  205.989888] PGD 1a10067 PUD 1a11063 PMD 7bcde067 PTE 3740e161
        [  205.989915] Oops: 0003 [#1] SMP
        [  205.990187] CPU: 2 PID: 14570 Comm: insmod Tainted: G           O  K 4.1.12
        [  205.990214] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
        [  205.990249] task: ffff8800374aaa90 ti: ffff8800794b8000 task.ti: ffff8800794b8000
        [  205.990276] RIP: 0010:[<ffffffff8154fecb>]  [<ffffffff8154fecb>] do_init_module+0x8c/0x1ba
        [  205.990307] RSP: 0018:ffff8800794bbd58  EFLAGS: 00010246
        [  205.990327] RAX: 0000000000000000 RBX: ffffffffa08d2fc0 RCX: 0000000000000000
        [  205.990356] RDX: 01ffff8000000080 RSI: 0000000000000000 RDI: ffffffff81a54b40
        [  205.990382] RBP: ffff88007b4c4d80 R08: 0000000000000007 R09: 0000000000000000
        [  205.990408] R10: 0000000000000008 R11: ffffea0001f18840 R12: 0000000000000000
        [  205.990433] R13: 0000000000000001 R14: ffffffffa08d2fc0 R15: ffff88007bd0bc40
        [  205.990459] FS:  00007f1128fbc700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
        [  205.990488] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [  205.990509] CR2: ffffffffa08d2fc0 CR3: 000000002606e000 CR4: 00000000001406e0
        [  205.990536] Stack:
        [  205.990545]  ffff8800794bbec8 0000000000000001 ffffffffa08d3010 ffffffff810ecea9
        [  205.990576]  ffffffff810e8e40 000000000005f360 ffff88007bd0bc50 ffffffffa08d3240
        [  205.990608]  ffffffffa08d52c0 ffffffffa08d3210 ffff8800794bbed8 ffff8800794bbf1c
        [  205.990639] Call Trace:
        [  205.990651]  [<ffffffff810ecea9>] ? load_module+0x1e59/0x23a0
        [  205.990672]  [<ffffffff810e8e40>] ? store_uevent+0x40/0x40
        [  205.990693]  [<ffffffff810e99b5>] ? copy_module_from_fd.isra.49+0xb5/0x140
        [  205.990718]  [<ffffffff810ed5bd>] ? SyS_finit_module+0x7d/0xa0
        [  205.990741]  [<ffffffff81556832>] ? system_call_fastpath+0x16/0x75
        [  205.990763] Code: f9 00 00 00 74 23 49 c7 c0 92 e1 60 81 48 8d 53 18 89 c1 4c 89 c6 48 c7 c7 f0 85 7d 81 31 c0 e8 71 fa ff ff e8 58 0e 00 00 31 f6 <c7> 03 00 00 00 00 48 89 da 48 c7 c7 20 c7 a5 81 e8 d0 ec b3 ff
        [  205.990916] RIP  [<ffffffff8154fecb>] do_init_module+0x8c/0x1ba
        [  205.990940]  RSP <ffff8800794bbd58>
        [  205.990953] CR2: ffffffffa08d2fc0
      
      With !CONFIG_DEBUG_SET_MODULE_RONX, module text and rodata pages are
      writable, and the debug_align() macro allows the module struct to share
      a page with executable text.  When klp_write_module_reloc() calls
      set_memory_ro() on the page, it effectively turns the module struct into
      a read-only structure, resulting in a page fault when load_module() does
      "mod->state = MODULE_STATE_LIVE".
      Reported-by: NCyril B. <cbay@alwaysdata.com>
      Tested-by: NCyril B. <cbay@alwaysdata.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      e2391a2d
  3. 01 11月, 2015 3 次提交
  4. 31 10月, 2015 1 次提交
  5. 27 10月, 2015 1 次提交
    • W
      x86/ioapic: Prevent NULL pointer dereference in setup_ioapic_dest() · ababae44
      Werner Pawlitschko 提交于
      Commit 4857c91f changed the way how irq affinity is setup in
      setup_ioapic_dest() from using the core helper function to
      unconditionally calling the irq_set_affinity() callback of the
      underlying irq chip.
      
      That results in a NULL pointer dereference for the rare case where the
      underlying irq chip is lapic_chip which has no irq_set_affinity()
      callback. lapic_chip is occasionally used for the timer interrupt (irq
      0).
      
      The fix is simple: Check the availability of the callback instead of
      calling it unconditionally.
      
      Fixes: 4857c91f "x86/ioapic: Force affinity setting in setup_ioapic_dest()"
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      ababae44
  6. 26 10月, 2015 1 次提交
  7. 23 10月, 2015 1 次提交
  8. 21 10月, 2015 12 次提交
    • B
      x86/microcode/intel: Move #ifdef DEBUG inside the function · c595ac2b
      Borislav Petkov 提交于
      ... and save us the stub.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1445334889-300-6-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c595ac2b
    • B
      x86/microcode/amd: Remove maintainers from comments · 6f7fc44b
      Borislav Petkov 提交于
      We have the MAINTAINERS file for that. Also, Andreas doesn't
      have the time for this work anymore.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1445334889-300-5-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6f7fc44b
    • B
      x86/microcode: Remove modularization leftovers · 6b26e1bf
      Borislav Petkov 提交于
      Remove the remaining module functionality leftovers. Make
      "dis_ucode_ldr" an early_param and make it static again. Drop
      module aliases, autoloading table, description, etc.
      
      Bump version number, while at it.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1445334889-300-4-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6b26e1bf
    • B
      x86/microcode: Merge the early microcode loader · fe055896
      Borislav Petkov 提交于
      Merge the early loader functionality into the driver proper. The
      diff is huge but logically, it is simply moving code from the
      _early.c files into the main driver.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1445334889-300-3-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fe055896
    • B
      x86/microcode: Unmodularize the microcode driver · 9a2bc335
      Borislav Petkov 提交于
      Make CONFIG_MICROCODE a bool. It was practically a bool already anyway,
      since early loader was forcing it to =y.
      
      Regardless, there's no real reason to have something be a module which
      gets built-in on the majority of installations out there. And its not
      like there's noticeable change in functionality - we still can load late
      microcode - just the module glue disappears.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1445334889-300-2-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9a2bc335
    • J
      timers/x86/hpet: Type adjustments · 3d45ac4b
      Jan Beulich 提交于
      Standardize on bool instead of an inconsistent mixture of u8 and plain 'int'.
      
      Also use u32 or 'unsigned int' instead of 'unsigned long' when a 32-bit type
      suffices, generating slightly better code on x86-64.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/5624E3A002000078000AC49A@prv-mh.provo.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3d45ac4b
    • A
      x86/mce: Fix thermal throttling reporting after kexec · 81ffdcdd
      Andi Kleen 提交于
      The per CPU thermal vector init code checks if the thermal
      vector is already installed and complains and bails out if it
      is.
      
      This happens after kexec, as kernel shut down does not clear the
      thermal vector APIC register.
      
      This causes two problems:
      
      1. So we always do not fully initialize thermal reports after
         kexec. The CPU is still likely initialized, as the previous
         kernel should have done it. But we don't set up the software
         pointer to the thermal vector, so reporting may end up with a
         unknown thermal interrupt message.
      
      2. Also it complains for every logical CPU, even though the
         value is actually derived from BP only.
      
      The problem is that we end up with one message per CPU, so on
      larger systems it becomes very noisy and messes up the otherwise
      nicely formatted CPU bootup numbers in the kernel log.
      
      Just remove the check. I checked the code and there's no valid
      code paths where the thermal init code for a CPU could be called
      multiple times.
      
      Why the kernel does not clean up this value on shutdown:
      
      The thermal monitoring is controlled per logical CPU thread.
      Normal shutdown code is just running on one CPU. To disable it
      we would need a broadcast NMI to all CPUs on shut down. That's
      overkill for this. So we just ignore it after kexec.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1445246268-26285-9-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      81ffdcdd
    • B
      x86/setup/crash: Check memblock_reserve() retval · 6f376057
      Borislav Petkov 提交于
      memblock_reserve() can fail but the crashkernel reservation code
      doesn't check that and this can lead the user into believing
      that the crashkernel region was actually reserved. Make sure we
      check that return value and we exit early with a failure message
      in the error case.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Young <dyoung@redhat.com>
      Reviewed-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: jerry_hoemann@hp.com
      Link: http://lkml.kernel.org/r/1445246268-26285-7-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6f376057
    • B
      x86/setup/crash: Cleanup some more · f56d5578
      Borislav Petkov 提交于
      * Remove unused auto_set variable
      * Cleanup local function variable declarations
      * Reformat printk string and use pr_info()
      
      No functionality change.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Young <dyoung@redhat.com>
      Reviewed-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: jerry_hoemann@hp.com
      Link: http://lkml.kernel.org/r/1445246268-26285-6-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f56d5578
    • B
      x86/setup/crash: Remove alignment variable · 606134f7
      Borislav Petkov 提交于
      Use a macro instead. No functionality change.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Young <dyoung@redhat.com>
      Reviewed-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: jerry_hoemann@hp.com
      Link: http://lkml.kernel.org/r/1445246268-26285-5-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      606134f7
    • B
      x86/setup: Cleanup crashkernel reservation functions · 97eac21b
      Borislav Petkov 提交于
      * Shorten variable names
      * Realign code, space out for better readability
      
      No code changed:
      
        # arch/x86/kernel/setup.o:
      
         text    data     bss     dec     hex filename
         4543    3096   69904   77543   12ee7 setup.o.before
         4543    3096   69904   77543   12ee7 setup.o.after
      
      md5:
         8a1b7c6738a553ca207b56bd84a8f359  setup.o.before.asm
         8a1b7c6738a553ca207b56bd84a8f359  setup.o.after.asm
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Young <dyoung@redhat.com>
      Reviewed-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: jerry_hoemann@hp.com
      Link: http://lkml.kernel.org/r/1445246268-26285-4-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      97eac21b
    • B
      x86/setup: Do not reserve crashkernel high memory if low reservation failed · eb6db83d
      Baoquan He 提交于
      People reported that when allocating crashkernel memory using
      the ",high" and ",low" syntax, there were cases where the
      reservation of the high portion succeeds but the reservation of
      the low portion fails.
      
      Then kexec can load the kdump kernel successfully, but booting
      the kdump kernel fails as there's no low memory.
      
      The low memory allocation for the kdump kernel can fail on large
      systems for a couple of reasons. For example, the manually
      specified crashkernel low memory can be too large and thus no
      adequate memblock region would be found.
      
      Therefore, we try to reserve low memory for the crash kernel
      *after* the high memory portion has been allocated. If that
      fails, we free crashkernel high memory too and return. The user
      can then take measures accordingly.
      Tested-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      [ Massage text. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: jerry_hoemann@hp.com
      Cc: yinghai@kernel.org
      Link: http://lkml.kernel.org/r/1445246268-26285-2-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      eb6db83d
  9. 20 10月, 2015 3 次提交
  10. 19 10月, 2015 3 次提交
  11. 16 10月, 2015 2 次提交
    • V
      x86/ioapic: Disable interrupts when re-routing legacy IRQs · c0ff971e
      Vitaly Kuznetsov 提交于
      A sporadic hang with consequent crash is observed when booting Hyper-V Gen1
      guests:
      
       Call Trace:
        <IRQ>
        [<ffffffff810ab68d>] ? trace_hardirqs_off+0xd/0x10
        [<ffffffff8107b616>] queue_work_on+0x46/0x90
        [<ffffffff81365696>] ? add_interrupt_randomness+0x176/0x1d0
        ...
        <EOI>
        [<ffffffff81471ddb>] ? _raw_spin_unlock_irqrestore+0x3b/0x60
        [<ffffffff810c295e>] __irq_put_desc_unlock+0x1e/0x40
        [<ffffffff810c5c35>] irq_modify_status+0xb5/0xd0
        [<ffffffff8104adbb>] mp_register_handler+0x4b/0x70
        [<ffffffff8104c55a>] mp_irqdomain_alloc+0x1ea/0x2a0
        [<ffffffff810c7f10>] irq_domain_alloc_irqs_recursive+0x40/0xa0
        [<ffffffff810c860c>] __irq_domain_alloc_irqs+0x13c/0x2b0
        [<ffffffff8104b070>] alloc_isa_irq_from_domain.isra.1+0xc0/0xe0
        [<ffffffff8104bfa5>] mp_map_pin_to_irq+0x165/0x2d0
        [<ffffffff8104c157>] pin_2_irq+0x47/0x80
        [<ffffffff81744253>] setup_IO_APIC+0xfe/0x802
        ...
        [<ffffffff814631c0>] ? rest_init+0x140/0x140
      
      The issue is easily reproducible with a simple instrumentation: if
      mdelay(10) is put between mp_setup_entry() and mp_register_handler() calls
      in mp_irqdomain_alloc() Hyper-V guest always fails to boot when re-routing
      IRQ0. The issue seems to be caused by the fact that we don't disable
      interrupts while doing IOPIC programming for legacy IRQs and IRQ0 actually
      happens. 
      
      Protect the setup sequence against concurrent interrupts.
      
      [ tglx: Make the protection unconditional and not only for legacy
        	interrupts ]
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Link: http://lkml.kernel.org/r/1444930943-19336-1-git-send-email-vkuznets@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      c0ff971e
    • P
      x86/setup: Extend low identity map to cover whole kernel range · f5f3497c
      Paolo Bonzini 提交于
      On 32-bit systems, the initial_page_table is reused by
      efi_call_phys_prolog as an identity map to call
      SetVirtualAddressMap.  efi_call_phys_prolog takes care of
      converting the current CPU's GDT to a physical address too.
      
      For PAE kernels the identity mapping is achieved by aliasing the
      first PDPE for the kernel memory mapping into the first PDPE
      of initial_page_table.  This makes the EFI stub's trick "just work".
      
      However, for non-PAE kernels there is no guarantee that the identity
      mapping in the initial_page_table extends as far as the GDT; in this
      case, accesses to the GDT will cause a page fault (which quickly becomes
      a triple fault).  Fix this by copying the kernel mappings from
      swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at
      identity mapping.
      
      For some reason, this is only reproducible with QEMU's dynamic translation
      mode, and not for example with KVM.  However, even under KVM one can clearly
      see that the page table is bogus:
      
          $ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize
          $ gdb
          (gdb) target remote localhost:1234
          (gdb) hb *0x02858f6f
          Hardware assisted breakpoint 1 at 0x2858f6f
          (gdb) c
          Continuing.
      
          Breakpoint 1, 0x02858f6f in ?? ()
          (gdb) monitor info registers
          ...
          GDT=     0724e000 000000ff
          IDT=     fffbb000 000007ff
          CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=00000690
          ...
      
      The page directory is sane:
      
          (gdb) x/4wx 0x32b7000
          0x32b7000:	0x03398063	0x03399063	0x0339a063	0x0339b063
          (gdb) x/4wx 0x3398000
          0x3398000:	0x00000163	0x00001163	0x00002163	0x00003163
          (gdb) x/4wx 0x3399000
          0x3399000:	0x00400003	0x00401003	0x00402003	0x00403003
      
      but our particular page directory entry is empty:
      
          (gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) * 4
          0x32b7070:	0x00000000
      
      [ It appears that you can skate past this issue if you don't receive
        any interrupts while the bogus GDT pointer is loaded, or if you avoid
        reloading the segment registers in general.
      
        Andy Lutomirski provides some additional insight:
      
         "AFAICT it's entirely permissible for the GDTR and/or LDT
          descriptor to point to unmapped memory.  Any attempt to use them
          (segment loads, interrupts, IRET, etc) will try to access that memory
          as if the access came from CPL 0 and, if the access fails, will
          generate a valid page fault with CR2 pointing into the GDT or
          LDT."
      
        Up until commit 23a0d4e8 ("efi: Disable interrupts around EFI
        calls, not in the epilog/prolog calls") interrupts were disabled
        around the prolog and epilog calls, and the functional GDT was
        re-installed before interrupts were re-enabled.
      
        Which explains why no one has hit this issue until now. ]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reported-by: NLaszlo Ersek <lersek@redhat.com>
      Cc: <stable@vger.kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      [ Updated changelog. ]
      f5f3497c
  12. 15 10月, 2015 1 次提交
    • L
      x86, ACPI: Handle apic/x2apic entries in MADT in correct order · d81056b5
      Lukasz Anaczkowski 提交于
      ACPI specifies the following rules when listing APIC IDs:
      (1) Boot processor is listed first
      (2) For multi-threaded processors, BIOS should list the first logical
          processor of each of the individual multi-threaded processors in MADT
          before listing any of the second logical processors.
      (3) APIC IDs < 0xFF should be listed in APIC subtable, APIC IDs >= 0xFF
          should be listed in X2APIC subtable
      
      Because of above, when there's more than 0xFF logical CPUs, BIOS
      interleaves APIC/X2APIC subtables.
      
      Assuming, there's 72 cores, 72 hyper-threads each, 288 CPUs total,
      listing is like this:
      
      APIC (0,4,8, .., 252)
      X2APIC (258,260,264, .. 284)
      APIC (1,5,9,...,253)
      X2APIC (259,261,265,...,285)
      APIC (2,6,10,...,254)
      X2APIC (260,262,266,..,286)
      APIC (3,7,11,...,251)
      X2APIC (255,261,262,266,..,287)
      
      Now, before this patch, due to how ACPI MADT subtables were parsed (BSP
      then X2APIC then APIC), kernel enumerated CPUs in reverted order (i.e.
      high APIC IDs were getting low logical IDs, and low APIC IDs were
      getting high logical IDs).
      This is wrong for the following reasons:
      () it's hard to predict how cores and threads are enumerated
      () when it's hard to predict, s/w threads cannot be properly affinitized
         causing significant performance impact due to e.g. inproper cache
         sharing
      () enumeration is inconsistent with how threads are enumerated on
         other Intel Xeon processors
      
      So, order in which MADT APIC/X2APIC handlers are passed is
      reverse and both handlers are passed to be called during same MADT
      table to walk to achieve correct CPU enumeration.
      
      In scenario when someone boots kernel with options 'maxcpus=72 nox2apic',
      in result less cores may be booted, since some of the CPUs the kernel
      will try to use will have APIC ID >= 0xFF. In such case, one
      should not pass 'nox2apic'.
      
      Disclimer: code parsing MADT APIC/X2APIC has not been touched since 2009,
      when X2APIC support was initially added. I do not know why MADT parsing
      code was added in the reversed order in the first place.
      I guess it didn't matter at that time since nobody cared about cores
      with APIC IDs >= 0xFF, right?
      
      This patch is based on work of "Yinghai Lu <yinghai@kernel.org>"
      previously published at https://lkml.org/lkml/2013/1/21/563
      
      Here's the explanation why parsing interface needs to be changed
      and why simpler approach will not work https://lkml.org/lkml/2015/9/7/285Signed-off-by: NLukasz Anaczkowski <lukasz.anaczkowski@intel.com>
      Acked-by: Thomas Gleixner <tglx@linutronix.de> (commit message)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      d81056b5
  13. 14 10月, 2015 1 次提交
  14. 12 10月, 2015 3 次提交
    • B
      x86/microcode/amd: Do not overwrite final patch levels · 0399f732
      Borislav Petkov 提交于
      A certain number of patch levels of applied microcode should not
      be overwritten by the microcode loader, otherwise bad things
      will happen.
      
      Check those and abort update if the current core has one of
      those final patch levels applied by the BIOS. 32-bit needs
      special handling, of course.
      
      See https://bugzilla.suse.com/show_bug.cgi?id=913996 for more
      info.
      Tested-by: NPeter Kirchgeßner <pkirchgessner@t-online.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1444641762-9437-7-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0399f732
    • B
      x86/microcode/amd: Extract current patch level read to a function · 2eff73c0
      Borislav Petkov 提交于
      Pave the way for checking the current patch level of the
      microcode in a core. We want to be able to do stuff depending on
      the patch level - in this case decide whether to update or not.
      But that will be added in a later patch.
      
      Drop unused local var uci assignment, while at it.
      
      Integrate a fix for 32-bit and CONFIG_PARAVIRT from Takashi Iwai:
      
       Use native_rdmsr() in check_current_patch_level() because with
       CONFIG_PARAVIRT enabled and on 32-bit, where we run before
       paging has been enabled, we cannot deref pv_info yet. Or we
       could, but we'd need to access its physical address. This way of
       fixing it is simpler. See:
      
         https://bugzilla.suse.com/show_bug.cgi?id=943179 for the background.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Takashi Iwai <tiwai@suse.com>:
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1444641762-9437-6-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2eff73c0
    • T
      efi: Add "efi_fake_mem" boot option · 0f96a99d
      Taku Izumi 提交于
      This patch introduces new boot option named "efi_fake_mem".
      By specifying this parameter, you can add arbitrary attribute
      to specific memory range.
      This is useful for debugging of Address Range Mirroring feature.
      
      For example, if "efi_fake_mem=2G@4G:0x10000,2G@0x10a0000000:0x10000"
      is specified, the original (firmware provided) EFI memmap will be
      updated so that the specified memory regions have
      EFI_MEMORY_MORE_RELIABLE attribute (0x10000):
      
       <original>
         efi: mem36: [Conventional Memory|  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000100000000-0x00000020a0000000) (129536MB)
      
       <updated>
         efi: mem36: [Conventional Memory|  |MR|  |  |  |   |WB|WT|WC|UC] range=[0x0000000100000000-0x0000000180000000) (2048MB)
         efi: mem37: [Conventional Memory|  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000180000000-0x00000010a0000000) (61952MB)
         efi: mem38: [Conventional Memory|  |MR|  |  |  |   |WB|WT|WC|UC] range=[0x00000010a0000000-0x0000001120000000) (2048MB)
         efi: mem39: [Conventional Memory|  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000001120000000-0x00000020a0000000) (63488MB)
      
      And you will find that the following message is output:
      
         efi: Memory: 4096M/131455M mirrored memory
      Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      0f96a99d