1. 21 10月, 2015 7 次提交
    • B
      x86/microcode: Unmodularize the microcode driver · 9a2bc335
      Borislav Petkov 提交于
      Make CONFIG_MICROCODE a bool. It was practically a bool already anyway,
      since early loader was forcing it to =y.
      
      Regardless, there's no real reason to have something be a module which
      gets built-in on the majority of installations out there. And its not
      like there's noticeable change in functionality - we still can load late
      microcode - just the module glue disappears.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1445334889-300-2-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9a2bc335
    • A
      x86/mce: Fix thermal throttling reporting after kexec · 81ffdcdd
      Andi Kleen 提交于
      The per CPU thermal vector init code checks if the thermal
      vector is already installed and complains and bails out if it
      is.
      
      This happens after kexec, as kernel shut down does not clear the
      thermal vector APIC register.
      
      This causes two problems:
      
      1. So we always do not fully initialize thermal reports after
         kexec. The CPU is still likely initialized, as the previous
         kernel should have done it. But we don't set up the software
         pointer to the thermal vector, so reporting may end up with a
         unknown thermal interrupt message.
      
      2. Also it complains for every logical CPU, even though the
         value is actually derived from BP only.
      
      The problem is that we end up with one message per CPU, so on
      larger systems it becomes very noisy and messes up the otherwise
      nicely formatted CPU bootup numbers in the kernel log.
      
      Just remove the check. I checked the code and there's no valid
      code paths where the thermal init code for a CPU could be called
      multiple times.
      
      Why the kernel does not clean up this value on shutdown:
      
      The thermal monitoring is controlled per logical CPU thread.
      Normal shutdown code is just running on one CPU. To disable it
      we would need a broadcast NMI to all CPUs on shut down. That's
      overkill for this. So we just ignore it after kexec.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1445246268-26285-9-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      81ffdcdd
    • B
      x86/setup/crash: Check memblock_reserve() retval · 6f376057
      Borislav Petkov 提交于
      memblock_reserve() can fail but the crashkernel reservation code
      doesn't check that and this can lead the user into believing
      that the crashkernel region was actually reserved. Make sure we
      check that return value and we exit early with a failure message
      in the error case.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Young <dyoung@redhat.com>
      Reviewed-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: jerry_hoemann@hp.com
      Link: http://lkml.kernel.org/r/1445246268-26285-7-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6f376057
    • B
      x86/setup/crash: Cleanup some more · f56d5578
      Borislav Petkov 提交于
      * Remove unused auto_set variable
      * Cleanup local function variable declarations
      * Reformat printk string and use pr_info()
      
      No functionality change.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Young <dyoung@redhat.com>
      Reviewed-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: jerry_hoemann@hp.com
      Link: http://lkml.kernel.org/r/1445246268-26285-6-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f56d5578
    • B
      x86/setup/crash: Remove alignment variable · 606134f7
      Borislav Petkov 提交于
      Use a macro instead. No functionality change.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Young <dyoung@redhat.com>
      Reviewed-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: jerry_hoemann@hp.com
      Link: http://lkml.kernel.org/r/1445246268-26285-5-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      606134f7
    • B
      x86/setup: Cleanup crashkernel reservation functions · 97eac21b
      Borislav Petkov 提交于
      * Shorten variable names
      * Realign code, space out for better readability
      
      No code changed:
      
        # arch/x86/kernel/setup.o:
      
         text    data     bss     dec     hex filename
         4543    3096   69904   77543   12ee7 setup.o.before
         4543    3096   69904   77543   12ee7 setup.o.after
      
      md5:
         8a1b7c6738a553ca207b56bd84a8f359  setup.o.before.asm
         8a1b7c6738a553ca207b56bd84a8f359  setup.o.after.asm
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Young <dyoung@redhat.com>
      Reviewed-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: jerry_hoemann@hp.com
      Link: http://lkml.kernel.org/r/1445246268-26285-4-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      97eac21b
    • B
      x86/setup: Do not reserve crashkernel high memory if low reservation failed · eb6db83d
      Baoquan He 提交于
      People reported that when allocating crashkernel memory using
      the ",high" and ",low" syntax, there were cases where the
      reservation of the high portion succeeds but the reservation of
      the low portion fails.
      
      Then kexec can load the kdump kernel successfully, but booting
      the kdump kernel fails as there's no low memory.
      
      The low memory allocation for the kdump kernel can fail on large
      systems for a couple of reasons. For example, the manually
      specified crashkernel low memory can be too large and thus no
      adequate memblock region would be found.
      
      Therefore, we try to reserve low memory for the crash kernel
      *after* the high memory portion has been allocated. If that
      fails, we free crashkernel high memory too and return. The user
      can then take measures accordingly.
      Tested-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      [ Massage text. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: jerry_hoemann@hp.com
      Cc: yinghai@kernel.org
      Link: http://lkml.kernel.org/r/1445246268-26285-2-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      eb6db83d
  2. 12 10月, 2015 2 次提交
  3. 02 10月, 2015 1 次提交
    • L
      x86/kexec: Fix kexec crash in syscall kexec_file_load() · e3c41e37
      Lee, Chun-Yi 提交于
      The original bug is a page fault crash that sometimes happens
      on big machines when preparing ELF headers:
      
          BUG: unable to handle kernel paging request at ffffc90613fc9000
          IP: [<ffffffff8103d645>] prepare_elf64_ram_headers_callback+0x165/0x260
      
      The bug is caused by us under-counting the number of memory ranges
      and subsequently not allocating enough ELF header space for them.
      The bug is typically masked on smaller systems, because the ELF header
      allocation is rounded up to the next page.
      
      This patch modifies the code in fill_up_crash_elf_data() by using
      walk_system_ram_res() instead of walk_system_ram_range() to correctly
      count the max number of crash memory ranges. That's because the
      walk_system_ram_range() filters out small memory regions that
      reside in the same page, but walk_system_ram_res() does not.
      
      Here's how I found the bug:
      
      After tracing prepare_elf64_headers() and prepare_elf64_ram_headers_callback(),
      the code uses walk_system_ram_res() to fill-in crash memory regions information
      to the program header, so it counts those small memory regions that
      reside in a page area.
      
      But, when the kernel was using walk_system_ram_range() in
      fill_up_crash_elf_data() to count the number of crash memory regions,
      it filters out small regions.
      
      I printed those small memory regions, for example:
      
        kexec: Get nr_ram ranges. vaddr=0xffff880077592258 paddr=0x77592258, sz=0xdc0
      
      Based on the code in walk_system_ram_range(), this memory region
      will be filtered out:
      
        pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
        end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
        end_pfn - pfn = 0x77593 - 0x77593 = 0  <=== if (end_pfn > pfn) is FALSE
      
      So, the max_nr_ranges that's counted by the kernel doesn't include
      small memory regions - causing us to under-allocate the required space.
      That causes the page fault crash that happens in a later code path
      when preparing ELF headers.
      
      This bug is not easy to reproduce on small machines that have few
      CPUs, because the allocated page aligned ELF buffer has more free
      space to cover those small memory regions' PT_LOAD headers.
      Signed-off-by: NLee, Chun-Yi <jlee@suse.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: kexec@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1443531537-29436-1-git-send-email-jlee@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e3c41e37
  4. 01 10月, 2015 2 次提交
    • T
      x86/process: Unify 32bit and 64bit implementations of get_wchan() · 7ba78053
      Thomas Gleixner 提交于
      The stack layout and the functionality is identical. Use the 64bit
      version for all of x86.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Reviewed-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: kasan-dev <kasan-dev@googlegroups.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
      Link: http://lkml.kernel.org/r/20150930083302.779694618@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      7ba78053
    • T
      x86/process: Add proper bound checks in 64bit get_wchan() · eddd3826
      Thomas Gleixner 提交于
      Dmitry Vyukov reported the following using trinity and the memory
      error detector AddressSanitizer
      (https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel).
      
      [ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
      address ffff88002e280000
      [ 124.576801] ffff88002e280000 is located 131938492886538 bytes to
      the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a)
      [ 124.578633] Accessed by thread T10915:
      [ 124.579295] inlined in describe_heap_address
      ./arch/x86/mm/asan/report.c:164
      [ 124.579295] #0 ffffffff810dd277 in asan_report_error
      ./arch/x86/mm/asan/report.c:278
      [ 124.580137] #1 ffffffff810dc6a0 in asan_check_region
      ./arch/x86/mm/asan/asan.c:37
      [ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0
      [ 124.581893] #3 ffffffff8107c093 in get_wchan
      ./arch/x86/kernel/process_64.c:444
      
      The address checks in the 64bit implementation of get_wchan() are
      wrong in several ways:
      
       - The lower bound of the stack is not the start of the stack
         page. It's the start of the stack page plus sizeof (struct
         thread_info)
      
       - The upper bound must be:
      
             top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long).
      
         The 2 * sizeof(unsigned long) is required because the stack pointer
         points at the frame pointer. The layout on the stack is: ... IP FP
         ... IP FP. So we need to make sure that both IP and FP are in the
         bounds.
      
      Fix the bound checks and get rid of the mix of numeric constants, u64
      and unsigned long. Making all unsigned long allows us to use the same
      function for 32bit as well.
      
      Use READ_ONCE() when accessing the stack. This does not prevent a
      concurrent wakeup of the task and the stack changing, but at least it
      avoids TOCTOU.
      
      Also check task state at the end of the loop. Again that does not
      prevent concurrent changes, but it avoids walking for nothing.
      
      Add proper comments while at it.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Based-on-patch-from: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Reviewed-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: kasan-dev <kasan-dev@googlegroups.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20150930083302.694788319@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      eddd3826
  5. 30 9月, 2015 1 次提交
  6. 28 9月, 2015 1 次提交
    • A
      x86/mce: Don't clear shared banks on Intel when offlining CPUs · 6e06780a
      Ashok Raj 提交于
      It is not safe to clear global MCi_CTL banks during CPU offline
      or suspend/resume operations. These MSRs are either
      thread-scoped (meaning private to a thread), or core-scoped
      (private to threads in that core only), or with a socket scope:
      visible and controllable from all threads in the socket.
      
      When we offline a single CPU, clearing those MCi_CTL bits will
      stop signaling for all the shared, i.e., socket-wide resources,
      such as LLC, iMC, etc.
      
      In addition, it might be possible to compromise the integrity of
      an Intel Secure Guard eXtentions (SGX) system if the attacker
      has control of the host system and is able to inject errors
      which would be otherwise ignored when MCi_CTL bits are cleared.
      
      Hence on SGX enabled systems, if MCi_CTL is cleared, SGX gets
      disabled.
      Tested-by: NSerge Ayoun <serge.ayoun@intel.com>
      Signed-off-by: NAshok Raj <ashok.raj@intel.com>
      [ Cleanup text. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1441391390-16985-1-git-send-email-ashok.raj@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6e06780a
  7. 25 9月, 2015 1 次提交
  8. 23 9月, 2015 2 次提交
  9. 18 9月, 2015 3 次提交
  10. 17 9月, 2015 1 次提交
  11. 16 9月, 2015 3 次提交
  12. 15 9月, 2015 2 次提交
    • S
      x86/apic: Serialize LVTT and TSC_DEADLINE writes · 5d7c631d
      Shaohua Li 提交于
      The APIC LVTT register is MMIO mapped but the TSC_DEADLINE register is an
      MSR. The write to the TSC_DEADLINE MSR is not serializing, so it's not
      guaranteed that the write to LVTT has reached the APIC before the
      TSC_DEADLINE MSR is written. In such a case the write to the MSR is
      ignored and as a consequence the local timer interrupt never fires.
      
      The SDM decribes this issue for xAPIC and x2APIC modes. The
      serialization methods recommended by the SDM differ.
      
      xAPIC:
       "1. Memory-mapped write to LVT Timer Register, setting bits 18:17 to 10b.
        2. WRMSR to the IA32_TSC_DEADLINE MSR a value much larger than current time-stamp counter.
        3. If RDMSR of the IA32_TSC_DEADLINE MSR returns zero, go to step 2.
        4. WRMSR to the IA32_TSC_DEADLINE MSR the desired deadline."
      
      x2APIC:
       "To allow for efficient access to the APIC registers in x2APIC mode,
        the serializing semantics of WRMSR are relaxed when writing to the
        APIC registers. Thus, system software should not use 'WRMSR to APIC
        registers in x2APIC mode' as a serializing instruction. Read and write
        accesses to the APIC registers will occur in program order. A WRMSR to
        an APIC register may complete before all preceding stores are globally
        visible; software can prevent this by inserting a serializing
        instruction, an SFENCE, or an MFENCE before the WRMSR."
      
      The xAPIC method is to just wait for the memory mapped write to hit
      the LVTT by checking whether the MSR write has reached the hardware.
      There is no reason why a proper MFENCE after the memory mapped write would
      not do the same. Andi Kleen confirmed that MFENCE is sufficient for the
      xAPIC case as well.
      
      Issue MFENCE before writing to the TSC_DEADLINE MSR. This can be done
      unconditionally as all CPUs which have TSC_DEADLINE also have MFENCE
      support.
      
      [ tglx: Massaged the changelog ]
      Signed-off-by: NShaohua Li <shli@fb.com>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: <Kernel-team@fb.com>
      Cc: <lenb@kernel.org>
      Cc: <fenghua.yu@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: stable@vger.kernel.org #v3.7+
      Link: http://lkml.kernel.org/r/20150909041352.GA2059853@devbig257.prn2.facebook.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5d7c631d
    • T
      x86/ioapic: Force affinity setting in setup_ioapic_dest() · 4857c91f
      Thomas Gleixner 提交于
      The recent ioapic cleanups changed the affinity setting in
      setup_ioapic_dest() from a direct write to the hardware to the delayed
      affinity setup via irq_set_affinity().
      
      That results in a warning from chained_irq_exit():
      WARNING: CPU: 0 PID: 5 at kernel/irq/migration.c:32 irq_move_masked_irq
      [<ffffffff810a0a88>] irq_move_masked_irq+0xb8/0xc0
      [<ffffffff8103c161>] ioapic_ack_level+0x111/0x130
      [<ffffffff812bbfe8>] intel_gpio_irq_handler+0x148/0x1c0
      
      The reason is that irq_set_affinity() does not write directly to the
      hardware. It marks the affinity setting as pending and executes it
      from the next interrupt. The chained handler infrastructure does not
      take the irq descriptor lock for performance reasons because such a
      chained interrupt is not visible to any interfaces. So the delayed
      affinity setting triggers the warning in irq_move_masked_irq().
      
      Restore the old behaviour by calling the set_affinity function of the
      ioapic chip in setup_ioapic_dest(). This is safe as none of the
      interrupts can be on the fly at this point.
      
      Fixes: aa5cb97f 'x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces'
      Reported-and-tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: jarkko.nikula@linux.intel.com
      4857c91f
  13. 14 9月, 2015 1 次提交
    • J
      x86/ldt: Fix small LDT allocation for Xen · f454b478
      Jan Beulich 提交于
      While the following commit:
      
        37868fe1 ("x86/ldt: Make modify_ldt synchronous")
      
      added a nice comment explaining that Xen needs page-aligned
      whole page chunks for guest descriptor tables, it then
      nevertheless used kzalloc() on the small size path.
      
      As I'm unaware of guarantees for kmalloc(PAGE_SIZE, ) to return
      page-aligned memory blocks, I believe this needs to be switched
      back to __get_free_page() (or better get_zeroed_page()).
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/55E735D6020000780009F1E6@prv-mh.provo.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f454b478
  14. 13 9月, 2015 2 次提交
  15. 11 9月, 2015 4 次提交
    • A
      perf/x86/intel/bts: Set event->hw.itrace_started in pmu::start to match the new logic · d2498729
      Alexander Shishkin 提交于
      Since event->hw.itrace_started is now set in pmu::start() to signal the beginning of
      the trace, do so also in the intel_bts driver.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: hpa@zytor.com
      Link: http://lkml.kernel.org/r/1437140050-23363-4-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d2498729
    • C
      dma-mapping: consolidate dma_set_mask · 452e06af
      Christoph Hellwig 提交于
      Almost everyone implements dma_set_mask the same way, although some time
      that's hidden in ->set_dma_mask methods.
      
      This patch consolidates those into a common implementation that either
      calls ->set_dma_mask if present or otherwise uses the default
      implementation.  Some architectures used to only call ->set_dma_mask
      after the initial checks, and those instance have been fixed to do the
      full work.  h8300 implemented dma_set_mask bogusly as a no-ops and has
      been fixed.
      
      Unfortunately some architectures overload unrelated semantics like changing
      the dma_ops into it so we still need to allow for an architecture override
      for now.
      
      [jcmvbkbc@gmail.com: fix xtensa]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      452e06af
    • C
      dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent} · 6894258e
      Christoph Hellwig 提交于
      Since 2009 we have a nice asm-generic header implementing lots of DMA API
      functions for architectures using struct dma_map_ops, but unfortunately
      it's still missing a lot of APIs that all architectures still have to
      duplicate.
      
      This series consolidates the remaining functions, although we still need
      arch opt outs for two of them as a few architectures have very
      non-standard implementations.
      
      This patch (of 5):
      
      The coherent DMA allocator works the same over all architectures supporting
      dma_map operations.
      
      This patch consolidates them and converges the minor differences:
      
       - the debug_dma helpers are now called from all architectures, including
         those that were previously missing them
       - dma_alloc_from_coherent and dma_release_from_coherent are now always
         called from the generic alloc/free routines instead of the ops
         dma-mapping-common.h always includes dma-coherent.h to get the defintions
         for them, or the stubs if the architecture doesn't support this feature
       - checks for ->alloc / ->free presence are removed.  There is only one
         magic instead of dma_map_ops without them (mic_dma_ops) and that one
         is x86 only anyway.
      
      Besides that only x86 needs special treatment to replace a default devices
      if none is passed and tweak the gfp_flags.  An optional arch hook is provided
      for that.
      
      [linux@roeck-us.net: fix build]
      [jcmvbkbc@gmail.com: fix xtensa]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6894258e
    • D
      kexec: split kexec_load syscall from kexec core code · 2965faa5
      Dave Young 提交于
      There are two kexec load syscalls, kexec_load another and kexec_file_load.
       kexec_file_load has been splited as kernel/kexec_file.c.  In this patch I
      split kexec_load syscall code to kernel/kexec.c.
      
      And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
      use kexec_file_load only, or vice verse.
      
      The original requirement is from Ted Ts'o, he want kexec kernel signature
      being checked with CONFIG_KEXEC_VERIFY_SIG enabled.  But kexec-tools use
      kexec_load syscall can bypass the checking.
      
      Vivek Goyal proposed to create a common kconfig option so user can compile
      in only one syscall for loading kexec kernel.  KEXEC/KEXEC_FILE selects
      KEXEC_CORE so that old config files still work.
      
      Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
      architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
      KEXEC_CORE in arch Kconfig.  Also updated general kernel code with to
      kexec_load syscall.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NDave Young <dyoung@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Petr Tesarik <ptesarik@suse.cz>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Josh Boyer <jwboyer@fedoraproject.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2965faa5
  16. 09 9月, 2015 1 次提交
    • M
      x86: use generic early mem copy · 5dd2c4bd
      Mark Salter 提交于
      The early_ioremap library now has a generic copy_from_early_mem()
      function.  Use the generic copy function for x86 relocate_initrd().
      
      [akpm@linux-foundation.org: remove MAX_MAP_CHUNK define, per Yinghai Lu]
      Signed-off-by: NMark Salter <msalter@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5dd2c4bd
  17. 05 9月, 2015 4 次提交
  18. 04 9月, 2015 1 次提交
    • T
      x86/alternatives: Make optimize_nops() interrupt safe and synced · 66c117d7
      Thomas Gleixner 提交于
      Richard reported the following crash:
      
      [    0.036000] BUG: unable to handle kernel paging request at 55501e06
      [    0.036000] IP: [<c0aae48b>] common_interrupt+0xb/0x38
      [    0.036000] Call Trace:
      [    0.036000]  [<c0409c80>] ? add_nops+0x90/0xa0
      [    0.036000]  [<c040a054>] apply_alternatives+0x274/0x630
      
      Chuck decoded:
      
       "  0:   8d 90 90 83 04 24       lea    0x24048390(%eax),%edx
          6:   80 fc 0f                cmp    $0xf,%ah
          9:   a8 0f                   test   $0xf,%al
       >> b:   a0 06 1e 50 55          mov    0x55501e06,%al
         10:   57                      push   %edi
         11:   56                      push   %esi
      
       Interrupt 0x30 occurred while the alternatives code was replacing the
       initial 0x90,0x90,0x90 NOPs (from the ASM_CLAC macro) with the
       optimized version, 0x8d,0x76,0x00. Only the first byte has been
       replaced so far, and it makes a mess out of the insn decoding."
      
      optimize_nops() is buggy in two aspects:
      
      - It's not disabling interrupts across the modification
      - It's lacking a sync_core() call
      
      Add both.
      
      Fixes: 4fd4b6e5 'x86/alternatives: Use optimized NOPs for padding'
      Reported-and-tested-by: N"Richard W.M. Jones" <rjones@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Richard W.M. Jones <rjones@redhat.com>
      Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1509031232340.15006@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      66c117d7
  19. 28 8月, 2015 1 次提交