1. 19 12月, 2017 1 次提交
    • J
      x86/stacktrace: Make zombie stack traces reliable · 6454b3bd
      Josh Poimboeuf 提交于
      Commit:
      
        1959a601 ("x86/dumpstack: Pin the target stack when dumping it")
      
      changed the behavior of stack traces for zombies.  Before that commit,
      /proc/<pid>/stack reported the last execution path of the zombie before
      it died:
      
        [<ffffffff8105b877>] do_exit+0x6f7/0xa80
        [<ffffffff8105bc79>] do_group_exit+0x39/0xa0
        [<ffffffff8105bcf0>] __wake_up_parent+0x0/0x30
        [<ffffffff8152dd09>] system_call_fastpath+0x16/0x1b
        [<00007fd128f9c4f9>] 0x7fd128f9c4f9
        [<ffffffffffffffff>] 0xffffffffffffffff
      
      After the commit, it just reports an empty stack trace.
      
      The new behavior is actually probably more correct.  If the stack
      refcount has gone down to zero, then the task has already gone through
      do_exit() and isn't going to run anymore.  The stack could be freed at
      any time and is basically gone, so reporting an empty stack makes sense.
      
      However, save_stack_trace_tsk_reliable() treats such a missing stack
      condition as an error.  That can cause livepatch transition stalls if
      there are any unreaped zombies.  Instead, just treat it as a reliable,
      empty stack.
      Reported-and-tested-by: NMiroslav Benes <mbenes@suse.cz>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Fixes: af085d90 ("stacktrace/x86: add function for detecting reliable stack traces")
      Link: http://lkml.kernel.org/r/e4b09e630e99d0c1080528f0821fc9d9dbaeea82.1513631620.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6454b3bd
  2. 18 12月, 2017 1 次提交
    • T
      x86/mm: Unbreak modules that use the DMA API · 9d5f38ba
      Tom Lendacky 提交于
      Commit d8aa7eea ("x86/mm: Add Secure Encrypted Virtualization (SEV)
      support") changed sme_active() from an inline function that referenced
      sme_me_mask to a non-inlined function in order to make the sev_enabled
      variable a static variable.  This function was marked EXPORT_SYMBOL_GPL
      because at the time the patch was submitted, sme_me_mask was marked
      EXPORT_SYMBOL_GPL.
      
      Commit 87df2617 ("x86/mm: Unbreak modules that rely on external
      PAGE_KERNEL availability") changed sme_me_mask variable from
      EXPORT_SYMBOL_GPL to EXPORT_SYMBOL, allowing external modules the ability
      to build with CONFIG_AMD_MEM_ENCRYPT=y.  Now, however, with sev_active()
      no longer an inline function and marked as EXPORT_SYMBOL_GPL, external
      modules that use the DMA API are once again broken in 4.15. Since the DMA
      API is meant to be used by external modules, this needs to be changed.
      
      Change the sme_active() and sev_active() functions from EXPORT_SYMBOL_GPL
      to EXPORT_SYMBOL.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Link: https://lkml.kernel.org/r/20171215162011.14125.7113.stgit@tlendack-t1.amdoffice.net
      9d5f38ba
  3. 16 12月, 2017 1 次提交
    • M
      x86/build: Make isoimage work on Debian · 5f0e3fe6
      Matthew Wilcox 提交于
      Debian does not ship a 'mkisofs' symlink to genisoimage.  All modern
      distros ship genisoimage, so just use that directly.  That requires
      renaming the 'genisoimage' function.  Also neaten up the 'for' loop
      while I'm in here.
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      Cc: Changbin Du <changbin.du@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5f0e3fe6
  4. 15 12月, 2017 6 次提交
  5. 12 12月, 2017 2 次提交
  6. 11 12月, 2017 1 次提交
    • K
      x86/mm/kmmio: Fix mmiotrace for page unaligned addresses · 6d60ce38
      Karol Herbst 提交于
      If something calls ioremap() with an address not aligned to PAGE_SIZE, the
      returned address might be not aligned as well. This led to a probe
      registered on exactly the returned address, but the entire page was armed
      for mmiotracing.
      
      On calling iounmap() the address passed to unregister_kmmio_probe() was
      PAGE_SIZE aligned by the caller leading to a complete freeze of the
      machine.
      
      We should always page align addresses while (un)registerung mappings,
      because the mmiotracer works on top of pages, not mappings. We still keep
      track of the probes based on their real addresses and lengths though,
      because the mmiotrace still needs to know what are mapped memory regions.
      
      Also move the call to mmiotrace_iounmap() prior page aligning the address,
      so that all probes are unregistered properly, otherwise the kernel ends up
      failing memory allocations randomly after disabling the mmiotracer.
      Tested-by: NLyude <lyude@redhat.com>
      Signed-off-by: NKarol Herbst <kherbst@redhat.com>
      Acked-by: NPekka Paalanen <ppaalanen@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: nouveau@lists.freedesktop.org
      Link: http://lkml.kernel.org/r/20171127075139.4928-1-kherbst@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6d60ce38
  7. 07 12月, 2017 5 次提交
  8. 06 12月, 2017 4 次提交
  9. 28 11月, 2017 2 次提交
  10. 25 11月, 2017 2 次提交
    • N
      x86/tlb: Disable interrupts when changing CR4 · 9d0b6232
      Nadav Amit 提交于
      CR4 modifications are implemented as RMW operations which update a shadow
      variable and write the result to CR4. The RMW operation is protected by
      preemption disable, but there is no enforcement or debugging mechanism.
      
      CR4 modifications happen also in interrupt context via
      __native_flush_tlb_global(). This implementation does not affect a
      interrupted thread context CR4 operation, because the CR4 toggle restores
      the original content and does not modify the shadow variable.
      
      So the current situation seems to be safe, but a recent patch tried to add
      an actual RMW operation in interrupt context, which will cause subtle
      corruptions.
      
      To prevent that and make the CR4 handling future proof:
      
       - Add a lockdep assertion to __cr4_set() which will catch interrupt
         enabled invocations
      
       - Disable interrupts in the cr4 manipulator inlines
      
       - Rename cr4_toggle_bits() to cr4_toggle_bits_irqsoff(). This is called
         from __switch_to_xtra() where interrupts are already disabled and
         performance matters.
      
      All other call sites are not performance critical, so the extra overhead of
      an additional local_irq_save/restore() pair is not a problem. If new call
      sites care about performance then the necessary _irqsoff() variants can be
      added.
      
      [ tglx: Condensed the patch by moving the irq protection inside the
        	manipulator functions. Updated changelog ]
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Luck <tony.luck@intel.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: nadav.amit@gmail.com
      Cc: linux-edac@vger.kernel.org
      Link: https://lkml.kernel.org/r/20171125032907.2241-3-namit@vmware.com
      9d0b6232
    • N
      x86/tlb: Refactor CR4 setting and shadow write · 0c3292ca
      Nadav Amit 提交于
      Refactor the write to CR4 and its shadow value. This is done in
      preparation for the addition of an assertion to check that IRQs are
      disabled during CR4 update.
      
      No functional change.
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: nadav.amit@gmail.com
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: linux-edac@vger.kernel.org
      Link: https://lkml.kernel.org/r/20171125032907.2241-2-namit@vmware.com
      0c3292ca
  11. 24 11月, 2017 4 次提交
  12. 23 11月, 2017 1 次提交
    • A
      x86/entry/64: Add missing irqflags tracing to native_load_gs_index() · ca37e57b
      Andy Lutomirski 提交于
      Running this code with IRQs enabled (where dummy_lock is a spinlock):
      
      static void check_load_gs_index(void)
      {
      	/* This will fail. */
      	load_gs_index(0xffff);
      
      	spin_lock(&dummy_lock);
      	spin_unlock(&dummy_lock);
      }
      
      Will generate a lockdep warning.  The issue is that the actual write
      to %gs would cause an exception with IRQs disabled, and the exception
      handler would, as an inadvertent side effect, update irqflag tracing
      to reflect the IRQs-off status.  native_load_gs_index() would then
      turn IRQs back on and return with irqflag tracing still thinking that
      IRQs were off.  The dummy lock-and-unlock causes lockdep to notice the
      error and warn.
      
      Fix it by adding the missing tracing.
      
      Apparently nothing did this in a context where it mattered.  I haven't
      tried to find a code path that would actually exhibit the warning if
      appropriately nasty user code were running.
      
      I suspect that the security impact of this bug is very, very low --
      production systems don't run with lockdep enabled, and the warning is
      mostly harmless anyway.
      
      Found during a quick audit of the entry code to try to track down an
      unrelated bug that Ingo found in some still-in-development code.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/e1aeb0e6ba8dd430ec36c8a35e63b429698b4132.1511411918.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ca37e57b
  13. 22 11月, 2017 2 次提交
    • A
      x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow · f68d62a5
      Andrey Ryabinin 提交于
      [ Note, this commit is a cherry-picked version of:
      
          d17a1d97: ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow")
      
        ... for easier x86 entry code testing and back-porting. ]
      
      The KASAN shadow is currently mapped using vmemmap_populate() since that
      provides a semi-convenient way to map pages into init_top_pgt.  However,
      since that no longer zeroes the mapped pages, it is not suitable for
      KASAN, which requires zeroed shadow memory.
      
      Add kasan_populate_shadow() interface and use it instead of
      vmemmap_populate().  Besides, this allows us to take advantage of
      gigantic pages and use them to populate the shadow, which should save us
      some memory wasted on page tables and reduce TLB pressure.
      
      Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f68d62a5
    • A
      x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing · 548c3050
      Andy Lutomirski 提交于
      When I added entry_SYSCALL_64_after_hwframe(), I left TRACE_IRQS_OFF
      before it.  This means that users of entry_SYSCALL_64_after_hwframe()
      were responsible for invoking TRACE_IRQS_OFF, and the one and only
      user (Xen, added in the same commit) got it wrong.
      
      I think this would manifest as a warning if a Xen PV guest with
      CONFIG_DEBUG_LOCKDEP=y were used with context tracking.  (The
      context tracking bit is to cause lockdep to get invoked before we
      turn IRQs back on.)  I haven't tested that for real yet because I
      can't get a kernel configured like that to boot at all on Xen PV.
      
      Move TRACE_IRQS_OFF below the label.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Fixes: 8a9949bc ("x86/xen/64: Rearrange the SYSCALL entries")
      Link: http://lkml.kernel.org/r/9150aac013b7b95d62c2336751d5b6e91d2722aa.1511325444.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      548c3050
  14. 21 11月, 2017 5 次提交
  15. 17 11月, 2017 3 次提交
    • P
      x86/smpboot: Fix __max_logical_packages estimate · b4c0a732
      Prarit Bhargava 提交于
      A system booted with a small number of cores enabled per package
      panics because the estimate of __max_logical_packages is too low.
      
      This occurs when the total number of active cores across all packages is
      less than the maximum core count for a single package. e.g.:
      
        On a 4 package system with 20 cores/package where only 4 cores are
        enabled on each package, the value of __max_logical_packages is
        calculated as DIV_ROUND_UP(16 / 20) = 1 and not 4.
      
      Calculate __max_logical_packages after the cpu enumeration has completed.
      Use the boot cpu's data to extrapolate the number of packages.
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: He Chen <he.chen@linux.intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Piotr Luc <piotr.luc@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Mathias Krause <minipli@googlemail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: https://lkml.kernel.org/r/20171114124257.22013-4-prarit@redhat.com
      b4c0a732
    • A
      x86/topology: Avoid wasting 128k for package id array · 30bb9811
      Andi Kleen 提交于
      Analyzing large early boot allocations unveiled the logical package id
      storage as a prominent memory waste. Since commit 1f12e32f
      ("x86/topology: Create logical package id") every 64-bit system allocates a
      128k array to convert logical package ids.
      
      This happens because the array is sized for MAX_LOCAL_APIC which is always
      32k on 64bit systems, and it needs 4 bytes for each entry.
      
      This is fairly wasteful, especially for the common case of having only one
      socket, which uses exactly 4 byte out of 128K.
      
      There is no user of the package id map which is performance critical, so
      the lookup is not required to be O(1). Store the logical processor id in
      cpu_data and use a loop based lookup.
      
      To keep the mapping stable accross cpu hotplug operations, add a flag to
      cpu_data which is set when the CPU is brought up the first time. When the
      flag is set, then cpu_data is not reinitialized by copying boot_cpu_data on
      subsequent bringups.
      
      [ tglx: Rename the flag to 'initialized', use proper pointers instead of
        	repeated cpu_data(x) evaluation and massage changelog. ]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: He Chen <he.chen@linux.intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Piotr Luc <piotr.luc@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Mathias Krause <minipli@googlemail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: https://lkml.kernel.org/r/20171114124257.22013-3-prarit@redhat.com
      30bb9811
    • A
      perf/x86/intel/uncore: Cache logical pkg id in uncore driver · d46b4c1c
      Andi Kleen 提交于
      The SNB-EP uncore driver is the only user of topology_phys_to_logical_pkg
      in a performance critical path.
      
      Change it query the logical pkg ID only once at initialization time and
      then cache it in box structure. This allows to change the logical package
      management without affecting the performance critical path.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: He Chen <he.chen@linux.intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Piotr Luc <piotr.luc@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Mathias Krause <minipli@googlemail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: https://lkml.kernel.org/r/20171114124257.22013-2-prarit@redhat.com
      d46b4c1c