1. 06 12月, 2017 3 次提交
  2. 28 11月, 2017 2 次提交
  3. 25 11月, 2017 2 次提交
    • N
      x86/tlb: Disable interrupts when changing CR4 · 9d0b6232
      Nadav Amit 提交于
      CR4 modifications are implemented as RMW operations which update a shadow
      variable and write the result to CR4. The RMW operation is protected by
      preemption disable, but there is no enforcement or debugging mechanism.
      
      CR4 modifications happen also in interrupt context via
      __native_flush_tlb_global(). This implementation does not affect a
      interrupted thread context CR4 operation, because the CR4 toggle restores
      the original content and does not modify the shadow variable.
      
      So the current situation seems to be safe, but a recent patch tried to add
      an actual RMW operation in interrupt context, which will cause subtle
      corruptions.
      
      To prevent that and make the CR4 handling future proof:
      
       - Add a lockdep assertion to __cr4_set() which will catch interrupt
         enabled invocations
      
       - Disable interrupts in the cr4 manipulator inlines
      
       - Rename cr4_toggle_bits() to cr4_toggle_bits_irqsoff(). This is called
         from __switch_to_xtra() where interrupts are already disabled and
         performance matters.
      
      All other call sites are not performance critical, so the extra overhead of
      an additional local_irq_save/restore() pair is not a problem. If new call
      sites care about performance then the necessary _irqsoff() variants can be
      added.
      
      [ tglx: Condensed the patch by moving the irq protection inside the
        	manipulator functions. Updated changelog ]
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Luck <tony.luck@intel.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: nadav.amit@gmail.com
      Cc: linux-edac@vger.kernel.org
      Link: https://lkml.kernel.org/r/20171125032907.2241-3-namit@vmware.com
      9d0b6232
    • N
      x86/tlb: Refactor CR4 setting and shadow write · 0c3292ca
      Nadav Amit 提交于
      Refactor the write to CR4 and its shadow value. This is done in
      preparation for the addition of an assertion to check that IRQs are
      disabled during CR4 update.
      
      No functional change.
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: nadav.amit@gmail.com
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: linux-edac@vger.kernel.org
      Link: https://lkml.kernel.org/r/20171125032907.2241-2-namit@vmware.com
      0c3292ca
  4. 24 11月, 2017 4 次提交
  5. 23 11月, 2017 1 次提交
    • A
      x86/entry/64: Add missing irqflags tracing to native_load_gs_index() · ca37e57b
      Andy Lutomirski 提交于
      Running this code with IRQs enabled (where dummy_lock is a spinlock):
      
      static void check_load_gs_index(void)
      {
      	/* This will fail. */
      	load_gs_index(0xffff);
      
      	spin_lock(&dummy_lock);
      	spin_unlock(&dummy_lock);
      }
      
      Will generate a lockdep warning.  The issue is that the actual write
      to %gs would cause an exception with IRQs disabled, and the exception
      handler would, as an inadvertent side effect, update irqflag tracing
      to reflect the IRQs-off status.  native_load_gs_index() would then
      turn IRQs back on and return with irqflag tracing still thinking that
      IRQs were off.  The dummy lock-and-unlock causes lockdep to notice the
      error and warn.
      
      Fix it by adding the missing tracing.
      
      Apparently nothing did this in a context where it mattered.  I haven't
      tried to find a code path that would actually exhibit the warning if
      appropriately nasty user code were running.
      
      I suspect that the security impact of this bug is very, very low --
      production systems don't run with lockdep enabled, and the warning is
      mostly harmless anyway.
      
      Found during a quick audit of the entry code to try to track down an
      unrelated bug that Ingo found in some still-in-development code.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/e1aeb0e6ba8dd430ec36c8a35e63b429698b4132.1511411918.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ca37e57b
  6. 22 11月, 2017 2 次提交
    • A
      x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow · f68d62a5
      Andrey Ryabinin 提交于
      [ Note, this commit is a cherry-picked version of:
      
          d17a1d97: ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow")
      
        ... for easier x86 entry code testing and back-porting. ]
      
      The KASAN shadow is currently mapped using vmemmap_populate() since that
      provides a semi-convenient way to map pages into init_top_pgt.  However,
      since that no longer zeroes the mapped pages, it is not suitable for
      KASAN, which requires zeroed shadow memory.
      
      Add kasan_populate_shadow() interface and use it instead of
      vmemmap_populate().  Besides, this allows us to take advantage of
      gigantic pages and use them to populate the shadow, which should save us
      some memory wasted on page tables and reduce TLB pressure.
      
      Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f68d62a5
    • A
      x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing · 548c3050
      Andy Lutomirski 提交于
      When I added entry_SYSCALL_64_after_hwframe(), I left TRACE_IRQS_OFF
      before it.  This means that users of entry_SYSCALL_64_after_hwframe()
      were responsible for invoking TRACE_IRQS_OFF, and the one and only
      user (Xen, added in the same commit) got it wrong.
      
      I think this would manifest as a warning if a Xen PV guest with
      CONFIG_DEBUG_LOCKDEP=y were used with context tracking.  (The
      context tracking bit is to cause lockdep to get invoked before we
      turn IRQs back on.)  I haven't tested that for real yet because I
      can't get a kernel configured like that to boot at all on Xen PV.
      
      Move TRACE_IRQS_OFF below the label.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Fixes: 8a9949bc ("x86/xen/64: Rearrange the SYSCALL entries")
      Link: http://lkml.kernel.org/r/9150aac013b7b95d62c2336751d5b6e91d2722aa.1511325444.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      548c3050
  7. 21 11月, 2017 5 次提交
  8. 17 11月, 2017 6 次提交
  9. 16 11月, 2017 3 次提交
    • C
      x86/mm: Limit mmap() of /dev/mem to valid physical addresses · be62a320
      Craig Bergstrom 提交于
      One thing /dev/mem access APIs should verify is that there's no way
      that excessively large pfn's can leak into the high bits of the
      page table entry.
      
      In particular, if people can use "very large physical page addresses"
      through /dev/mem to set the bits past bit 58 - SOFTW4 and permission
      key bits and NX bit, that could *really* confuse the kernel.
      
      We had an earlier attempt:
      
        ce56a86e ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses")
      
      ... which turned out to be too restrictive (breaking mem=... bootups for example) and
      had to be reverted in:
      
        90edaac6 ("Revert "x86/mm: Limit mmap() of /dev/mem to valid physical addresses"")
      
      This v2 attempt modifies the original patch and makes sure that mmap(/dev/mem)
      limits the pfns so that it at least fits in the actual pteval_t architecturally:
      
       - Make sure mmap_mem() actually validates that the offset fits in phys_addr_t
      
          ( This may be indirectly true due to some other check, but it's not
            entirely obvious. )
      
       - Change valid_mmap_phys_addr_range() to just use phys_addr_valid()
         on the top byte
      
          ( Top byte is sufficient, because mmap_mem() has already checked that
            it cannot wrap. )
      
       - Add a few comments about what the valid_phys_addr_range() vs.
         valid_mmap_phys_addr_range() difference is.
      Signed-off-by: NCraig Bergstrom <craigb@google.com>
      [ Fixed the checks and added comments. ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      [ Collected the discussion and patches into a commit. ]
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hans Verkuil <hans.verkuil@cisco.com>
      Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sander Eikelenboom <linux@eikelenboom.it>
      Cc: Sean Young <sean@mess.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/CA+55aFyEcOMb657vWSmrM13OxmHxC-XxeBmNis=DwVvpJUOogQ@mail.gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      be62a320
    • K
      x86/selftests: Add test for mapping placement for 5-level paging · 97f404ad
      Kirill A. Shutemov 提交于
      5-level paging provides a 56-bit virtual address space for user space
      application. But the kernel defaults to mappings below the 47-bit address
      space boundary, which is the upper bound for 4-level paging, unless an
      application explicitely request it by using a mmap(2) address hint above
      the 47-bit boundary. The kernel prevents mappings which spawn across the
      47-bit boundary unless mmap(2) was invoked with MAP_FIXED.
      
      Add a self-test that covers the corner cases of the interface and validates
      the correctness of the implementation.
      
      [ tglx: Massaged changelog once more ]
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20171115143607.81541-2-kirill.shutemov@linux.intel.com
      97f404ad
    • K
      x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border · 1e0f25db
      Kirill A. Shutemov 提交于
      In case of 5-level paging, the kernel does not place any mapping above
      47-bit, unless userspace explicitly asks for it.
      
      Userspace can request an allocation from the full address space by
      specifying the mmap address hint above 47-bit.
      
      Nicholas noticed that the current implementation violates this interface:
      
        If user space requests a mapping at the end of the 47-bit address space
        with a length which causes the mapping to cross the 47-bit border
        (DEFAULT_MAP_WINDOW), then the vma is partially in the address space
        below and above.
      
      Sanity check the mmap address hint so that start and end of the resulting
      vma are on the same side of the 47-bit border. If that's not the case fall
      back to the code path which ignores the address hint and allocate from the
      regular address space below 47-bit.
      
      To make the checks consistent, mask out the address hints lower bits
      (either PAGE_MASK or huge_page_mask()) instead of using ALIGN() which can
      push them up to the next boundary.
      
      [ tglx: Moved the address check to a function and massaged comment and
        	changelog ]
      Reported-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20171115143607.81541-1-kirill.shutemov@linux.intel.com
      1e0f25db
  10. 14 11月, 2017 12 次提交
    • R
      x86/umip: Identify the STR and SLDT instructions · 6e2a3064
      Ricardo Neri 提交于
      The STR and SLDT instructions are not emulated by the UMIP code, thus
      there's no functionality in the decoder to identify them.
      
      However, a subsequent commit will introduce a warning about the use
      of all the instructions that UMIP protect/changes, not only those that
      are emulated.
      
      A first step for that is to add the ability to decode/identify them.
      
      Plus, now that STR and SLDT are identified, we need to explicitly avoid
      their emulation (i.e., not rely on successful identification). Group
      together all the cases that we do not want to emulate: STR, SLDT and user
      long mode processes.
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: ricardo.neri@intel.com
      Link: http://lkml.kernel.org/r/1510640985-18412-4-git-send-email-ricardo.neri-calderon@linux.intel.com
      [ Rewrote the changelog, fixed ugly col80 artifact. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6e2a3064
    • R
      x86/umip: Print a line in the boot log that UMIP has been enabled · 770c7755
      Ricardo Neri 提交于
      Indicate that this feature has been enabled.
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: ricardo.neri@intel.com
      Link: http://lkml.kernel.org/r/1510640985-18412-3-git-send-email-ricardo.neri-calderon@linux.intel.com
      [ Changelog tweaks. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      770c7755
    • R
      x86/umip: Select X86_INTEL_UMIP by default · 796ebc81
      Ricardo Neri 提交于
      UMIP does cause any performance penalty to the vast majority of x86 code
      that does not use the legacy instructions affected by UMIP.
      
      Also describe UMIP more accurately and explain the behavior that can be
      expected by the (few) applications that use the affected instructions.
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: ricardo.neri@intel.com
      Link: http://lkml.kernel.org/r/1510640985-18412-2-git-send-email-ricardo.neri-calderon@linux.intel.com
      [ Spelling fixes, rewrote the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      796ebc81
    • R
      x86 / CPU: Avoid unnecessary IPIs in arch_freq_get_on_cpu() · b29c6ef7
      Rafael J. Wysocki 提交于
      Even though aperfmperf_snapshot_khz() caches the samples.khz value to
      return if called again in a sufficiently short time, its caller,
      arch_freq_get_on_cpu(), still uses smp_call_function_single() to run it
      which may allow user space to trigger an IPI storm by reading from the
      scaling_cur_freq cpufreq sysfs file in a tight loop.
      
      To avoid that, move the decision on whether or not to return the cached
      samples.khz value to arch_freq_get_on_cpu().
      
      This change was part of commit 941f5f0f ("x86: CPU: Fix up "cpu MHz"
      in /proc/cpuinfo"), but it was not the reason for the revert and it
      remains applicable.
      
      Fixes: 4815d3c5 (cpufreq: x86: Make scaling_cur_freq behave more as expected)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NWANG Chao <chao.wang@ucloud.cn>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b29c6ef7
    • L
      Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 99306dfc
      Linus Torvalds 提交于
      Pull x86 timer updates from Thomas Gleixner:
       "These updates are related to TSC handling:
      
         - Support platforms which have synchronized TSCs but the boot CPU has
           a non zero TSC_ADJUST value, which is considered a firmware bug on
           normal systems.
      
           This applies to HPE/SGI UV platforms where the platform firmware
           uses TSC_ADJUST to ensure TSC synchronization across a huge number
           of sockets, but due to power on timings the boot CPU cannot be
           guaranteed to have a zero TSC_ADJUST register value.
      
         - Fix the ordering of udelay calibration and kvmclock_init()
      
         - Cleanup the udelay and calibration code"
      
      * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/tsc: Mark cyc2ns_init() and detect_art() __init
        x86/platform/UV: Mark tsc_check_sync as an init function
        x86/tsc: Make CONFIG_X86_TSC=n build work again
        x86/platform/UV: Add check of TSC state set by UV BIOS
        x86/tsc: Provide a means to disable TSC ART
        x86/tsc: Drastically reduce the number of firmware bug warnings
        x86/tsc: Skip TSC test and error messages if already unstable
        x86/tsc: Add option that TSC on Socket 0 being non-zero is valid
        x86/timers: Move simple_udelay_calibration() past kvmclock_init()
        x86/timers: Make recalibrate_cpu_khz() void
        x86/timers: Move the simple udelay calibration to tsc.h
      99306dfc
    • L
      Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3643b7e0
      Linus Torvalds 提交于
      Pull x86 cache resource updates from Thomas Gleixner:
       "This update provides updates to RDT:
      
        - A diagnostic framework for the Resource Director Technology (RDT)
          user interface (sysfs). The failure modes of the user interface are
          hard to diagnose from the error codes. An extra last command status
          file provides now sensible textual information about the failure so
          its simpler to use.
      
        - A few minor cleanups and updates in the RDT code"
      
      * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/intel_rdt: Fix a silent failure when writing zero value schemata
        x86/intel_rdt: Fix potential deadlock during resctrl mount
        x86/intel_rdt: Fix potential deadlock during resctrl unmount
        x86/intel_rdt: Initialize bitmask of shareable resource if CDP enabled
        x86/intel_rdt: Remove redundant assignment
        x86/intel_rdt/cqm: Make integer rmid_limbo_count static
        x86/intel_rdt: Add documentation for "info/last_cmd_status"
        x86/intel_rdt: Add diagnostics when making directories
        x86/intel_rdt: Add diagnostics when writing the cpus file
        x86/intel_rdt: Add diagnostics when writing the tasks file
        x86/intel_rdt: Add diagnostics when writing the schemata file
        x86/intel_rdt: Add framework for better RDT UI diagnostics
      3643b7e0
    • L
      Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b18d6289
      Linus Torvalds 提交于
      Pull x86 APIC updates from Thomas Gleixner:
       "This update provides a major overhaul of the APIC initialization and
        vector allocation code:
      
         - Unification of the APIC and interrupt mode setup which was
           scattered all over the place and was hard to follow. This also
           distangles the timer setup from the APIC initialization which
           brings a clear separation of functionality.
      
           Great detective work from Dou Lyiang!
      
         - Refactoring of the x86 vector allocation mechanism. The existing
           code was based on nested loops and rather convoluted APIC callbacks
           which had a horrible worst case behaviour and tried to serve all
           different use cases in one go. This led to quite odd hacks when
           supporting the new managed interupt facility for multiqueue devices
           and made it more or less impossible to deal with the vector space
           exhaustion which was a major roadblock for server hibernation.
      
           Aside of that the code dealing with cpu hotplug and the system
           vectors was disconnected from the actual vector management and
           allocation code, which made it hard to follow and maintain.
      
           Utilizing the new bitmap matrix allocator core mechanism, the new
           allocator and management code consolidates the handling of system
           vectors, legacy vectors, cpu hotplug mechanisms and the actual
           allocation which needs to be aware of system and legacy vectors and
           hotplug constraints into a single consistent entity.
      
           This has one visible change: The support for multi CPU targets of
           interrupts, which is only available on a certain subset of
           CPUs/APIC variants has been removed in favour of single interrupt
           targets. A proper analysis of the multi CPU target feature revealed
           that there is no real advantage as the vast majority of interrupts
           end up on the CPU with the lowest APIC id in the set of target CPUs
           anyway. That change was agreed on by the relevant folks and allowed
           to simplify the implementation significantly and to replace rather
           fragile constructs like the vector cleanup IPI with straight
           forward and solid code.
      
           Furthermore this allowed to cleanly separate the allocation details
           for legacy, normal and managed interrupts:
      
            * Legacy interrupts are not longer wasting 16 vectors
              unconditionally
      
            * Managed interrupts have now a guaranteed vector reservation, but
              the actual vector assignment happens when the interrupt is
              requested. It's guaranteed not to fail.
      
            * Normal interrupts no longer allocate vectors unconditionally
              when the interrupt is set up (IO/APIC init or MSI(X) enable).
              The mechanism has been switched to a best effort reservation
              mode. The actual allocation happens when the interrupt is
              requested. Contrary to managed interrupts the request can fail
              due to vector space exhaustion, but drivers must handle a fail
              of request_irq() anyway. When the interrupt is freed, the vector
              is handed back as well.
      
              This solves a long standing problem with large unconditional
              vector allocations for a certain class of enterprise devices
              which prevented server hibernation due to vector space
              exhaustion when the unused allocated vectors had to be migrated
              to CPU0 while unplugging all non boot CPUs.
      
           The code has been equipped with trace points and detailed debugfs
           information to aid analysis of the vector space"
      
      * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
        x86/vector/msi: Select CONFIG_GENERIC_IRQ_RESERVATION_MODE
        PCI/MSI: Set MSI_FLAG_MUST_REACTIVATE in core code
        genirq: Add config option for reservation mode
        x86/vector: Use correct per cpu variable in free_moved_vector()
        x86/apic/vector: Ignore set_affinity call for inactive interrupts
        x86/apic: Fix spelling mistake: "symmectic" -> "symmetric"
        x86/apic: Use dead_cpu instead of current CPU when cleaning up
        ACPI/init: Invoke early ACPI initialization earlier
        x86/vector: Respect affinity mask in irq descriptor
        x86/irq: Simplify hotplug vector accounting
        x86/vector: Switch IOAPIC to global reservation mode
        x86/vector/msi: Switch to global reservation mode
        x86/vector: Handle managed interrupts proper
        x86/io_apic: Reevaluate vector configuration on activate()
        iommu/amd: Reevaluate vector configuration on activate()
        iommu/vt-d: Reevaluate vector configuration on activate()
        x86/apic/msi: Force reactivation of interrupts at startup time
        x86/vector: Untangle internal state from irq_cfg
        x86/vector: Compile SMP only code conditionally
        x86/apic: Remove unused callbacks
        ...
      b18d6289
    • L
      Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7d58e1c9
      Linus Torvalds 提交于
      Pull smp/hotplug updates from Thomas Gleixner:
       "No functional changes, just removal of obsolete and outdated defines,
        macros and documentation"
      
      * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        cpu/hotplug: Get rid of CPU hotplug notifier leftovers
        cpu/hotplug: Remove obsolete notifier macros
      7d58e1c9
    • L
      Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2bcc6731
      Linus Torvalds 提交于
      Pull timer updates from Thomas Gleixner:
       "Yet another big pile of changes:
      
         - More year 2038 work from Arnd slowly reaching the point where we
           need to think about the syscalls themself.
      
         - A new timer function which allows to conditionally (re)arm a timer
           only when it's either not running or the new expiry time is sooner
           than the armed expiry time. This allows to use a single timer for
           multiple timeout requirements w/o caring about the first expiry
           time at the call site.
      
         - A new NMI safe accessor to clock real time for the printk timestamp
           work. Can be used by tracing, perf as well if required.
      
         - A large number of timer setup conversions from Kees which got
           collected here because either maintainers requested so or they
           simply got ignored. As Kees pointed out already there are a few
           trivial merge conflicts and some redundant commits which was
           unavoidable due to the size of this conversion effort.
      
         - Avoid a redundant iteration in the timer wheel softirq processing.
      
         - Provide a mechanism to treat RTC implementations depending on their
           hardware properties, i.e. don't inflict the write at the 0.5
           seconds boundary which originates from the PC CMOS RTC to all RTCs.
           No functional change as drivers need to be updated separately.
      
         - The usual small updates to core code clocksource drivers. Nothing
           really exciting"
      
      * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (111 commits)
        timers: Add a function to start/reduce a timer
        pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday()
        timer: Prepare to change all DEFINE_TIMER() callbacks
        netfilter: ipvs: Convert timers to use timer_setup()
        scsi: qla2xxx: Convert timers to use timer_setup()
        block/aoe: discover_timer: Convert timers to use timer_setup()
        ide: Convert timers to use timer_setup()
        drbd: Convert timers to use timer_setup()
        mailbox: Convert timers to use timer_setup()
        crypto: Convert timers to use timer_setup()
        drivers/pcmcia: omap1: Fix error in automated timer conversion
        ARM: footbridge: Fix typo in timer conversion
        drivers/sgi-xp: Convert timers to use timer_setup()
        drivers/pcmcia: Convert timers to use timer_setup()
        drivers/memstick: Convert timers to use timer_setup()
        drivers/macintosh: Convert timers to use timer_setup()
        hwrng/xgene-rng: Convert timers to use timer_setup()
        auxdisplay: Convert timers to use timer_setup()
        sparc/led: Convert timers to use timer_setup()
        mips: ip22/32: Convert timers to use timer_setup()
        ...
      2bcc6731
    • L
      Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 670310df
      Linus Torvalds 提交于
      Pull irq core updates from Thomas Gleixner:
       "A rather large update for the interrupt core code and the irq chip drivers:
      
         - Add a new bitmap matrix allocator and supporting changes, which is
           used to replace the x86 vector allocator which comes with separate
           pull request. This allows to replace the convoluted nested loop
           allocation function in x86 with a facility which supports the
           recently added property of managed interrupts proper and allows to
           switch to a best effort vector reservation scheme, which addresses
           problems with vector exhaustion.
      
         - A large update to the ARM GIC-V3-ITS driver adding support for
           range selectors.
      
         - New interrupt controllers:
             - Meson and Meson8 GPIO
             - BCM7271 L2
             - Socionext EXIU
      
           If you expected that this will stop at some point, I have to
           disappoint you. There are new ones posted already. Sigh!
      
         - STM32 interrupt controller support for new platforms.
      
         - A pile of fixes, cleanups and updates to the MIPS GIC driver
      
         - The usual small fixes, cleanups and updates all over the place.
           Most visible one is to move the irq chip drivers Kconfig switches
           into a separate Kconfig menu"
      
      * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
        genirq: Fix type of shifting literal 1 in __setup_irq()
        irqdomain: Drop pointless NULL check in virq_debug_show_one
        genirq/proc: Return proper error code when irq_set_affinity() fails
        irq/work: Use llist_for_each_entry_safe
        irqchip: mips-gic: Print warning if inherited GIC base is used
        irqchip/mips-gic: Add pr_fmt and reword pr_* messages
        irqchip/stm32: Move the wakeup on interrupt mask
        irqchip/stm32: Fix initial values
        irqchip/stm32: Add stm32h7 support
        dt-bindings/interrupt-controllers: Add compatible string for stm32h7
        irqchip/stm32: Add multi-bank management
        irqchip/stm32: Select GENERIC_IRQ_CHIP
        irqchip/exiu: Add support for Socionext Synquacer EXIU controller
        dt-bindings: Add description of Socionext EXIU interrupt controller
        irqchip/gic-v3-its: Fix VPE activate callback return value
        irqchip: mips-gic: Make IPI bitmaps static
        irqchip: mips-gic: Share register writes in gic_set_type()
        irqchip: mips-gic: Remove gic_vpes variable
        irqchip: mips-gic: Use num_possible_cpus() to reserve IPIs
        irqchip: mips-gic: Configure EIC when CPUs come online
        ...
      670310df
    • L
      Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 43ff2f4d
      Linus Torvalds 提交于
      Pull x86 platform updates from Ingo Molnar:
       "The main changes in this cycle were:
      
         - a refactoring of the early virt init code by merging 'struct
           x86_hyper' into 'struct x86_platform' and 'struct x86_init', which
           allows simplifications and also the addition of a new
           ->guest_late_init() callback. (Juergen Gross)
      
         - timer_setup() conversion of the UV code (Kees Cook)"
      
      * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/virt/xen: Use guest_late_init to detect Xen PVH guest
        x86/virt, x86/platform: Add ->guest_late_init() callback to hypervisor_x86 structure
        x86/virt, x86/acpi: Add test for ACPI_FADT_NO_VGA
        x86/virt: Add enum for hypervisors to replace x86_hyper
        x86/virt, x86/platform: Merge 'struct x86_hyper' into 'struct x86_platform' and 'struct x86_init'
        x86/platform/UV: Convert timers to use timer_setup()
      43ff2f4d
    • L
      Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 13e57da4
      Linus Torvalds 提交于
      Pull x86 debug update from Ingo Molnar:
       "A single change enhancing stack traces by hiding wrapper function
        entries"
      
      * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/stacktrace: Avoid recording save_stack_trace() wrappers
      13e57da4