1. 01 9月, 2017 1 次提交
  2. 29 8月, 2017 10 次提交
    • P
      perf/x86: Fix caps/ for !Intel · 5da382eb
      Peter Zijlstra 提交于
      Move the 'max_precise' capability into generic x86 code where it
      belongs. This fixes a sysfs splat on !Intel systems where we fail to set
      x86_pmu_caps_group.atts.
      Reported-and-tested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: hpa@zytor.com
      Fixes: 22688d1c20f5 ("x86/perf: Export some PMU attributes in caps/ directory")
      Link: http://lkml.kernel.org/r/20170828104650.2u3rsim4jafyjzv2@hirez.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5da382eb
    • K
      perf/core, x86: Add PERF_SAMPLE_PHYS_ADDR · fc7ce9c7
      Kan Liang 提交于
      For understanding how the workload maps to memory channels and hardware
      behavior, it's very important to collect address maps with physical
      addresses. For example, 3D XPoint access can only be found by filtering
      the physical address.
      
      Add a new sample type for physical address.
      
      perf already has a facility to collect data virtual address. This patch
      introduces a function to convert the virtual address to physical address.
      The function is quite generic and can be extended to any architecture as
      long as a virtual address is provided.
      
       - For kernel direct mapping addresses, virt_to_phys is used to convert
         the virtual addresses to physical address.
      
       - For user virtual addresses, __get_user_pages_fast is used to walk the
         pages tables for user physical address.
      
       - This does not work for vmalloc addresses right now. These are not
         resolved, but code to do that could be added.
      
      The new sample type requires collecting the virtual address. The
      virtual address will not be output unless SAMPLE_ADDR is applied.
      
      For security, the physical address can only be exposed to root or
      privileged user.
      Tested-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: mpe@ellerman.id.au
      Link: http://lkml.kernel.org/r/1503967969-48278-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fc7ce9c7
    • A
      perf/core, pt, bts: Get rid of itrace_started · 8d4e6c4c
      Alexander Shishkin 提交于
      I just noticed that hw.itrace_started and hw.config are aliased to the
      same location. Now, the PT driver happens to use both, which works out
      fine by sheer luck:
      
       - STORE(hw.itrace_start) is ordered before STORE(hw.config), in the
          program order, although there are no compiler barriers to ensure that,
      
       - to the perf_log_itrace_start() hw.itrace_start looks set at the same
         time as when it is intended to be set because both stores happen in the
         same path,
      
       - hw.config is never reset to zero in the PT driver.
      
      Now, the use of hw.config by the PT driver makes more sense (it being a
      HW PMU) than messing around with itrace_started, which is an awkward API
      to begin with.
      
      This patch replaces hw.itrace_started with an attach_state bit and an
      API call for the PMU drivers to use to communicate the condition.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20170330153956.25994-1-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8d4e6c4c
    • J
      x86/boot: Prevent faulty bootparams.screeninfo from causing harm · fb1cc2f9
      Jan H. Schönherr 提交于
      If a zero for the number of lines manages to slip through, scroll()
      may underflow some offset calculations, causing accesses outside the
      video memory.
      
      Make the check in __putstr() more pessimistic to prevent that.
      Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1503858223-14983-1-git-send-email-jschoenh@amazon.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fb1cc2f9
    • J
      x86/boot: Provide more slack space during decompression · 5746f055
      Jan H. Schönherr 提交于
      The current slack space is not enough for LZ4, which has a worst case
      overhead of 0.4% for data that cannot be further compressed. With
      an LZ4 compressed kernel with an embedded initrd, the output is likely
      to overwrite the input.
      
      Increase the slack space to avoid that.
      Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1503842124-29718-1-git-send-email-jschoenh@amazon.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5746f055
    • J
      x86/entry/64: Use ENTRY() instead of ALIGN+GLOBAL for stub32_clone() · 49993489
      Jiri Slaby 提交于
      ALIGN+GLOBAL is effectively what ENTRY() does, so use ENTRY() which is
      dedicated for exactly this purpose -- global functions.
      
      Note that stub32_clone() is a C-like leaf function -- it has a standard
      call frame -- it only switches one argument and continues by jumping
      into C. Since each ENTRY() should be balanced by some END*() marker, we
      add a corresponding ENDPROC() to stub32_clone() too.
      
      Besides that, x86's custom GLOBAL macro is going to die very soon.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170824080624.7768-2-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
      49993489
    • J
      x86/fpu/math-emu: Add ENDPROC to functions · bd6be579
      Jiri Slaby 提交于
      Functions in math-emu are annotated as ENTRY() symbols, but their
      ends are not annotated at all. But these are standard functions
      called from C, with proper stack register update etc.
      
      Omitting the ends means:
      
        * the annotations are not paired and we cannot deal with such functions
          e.g. in objtool
      
        * the symbols are not marked as functions in the object file
      
        * there are no sizes of the functions in the object file
      
      So fix this by adding ENDPROC() to each such case in math-emu.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170824080624.7768-1-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bd6be579
    • J
      x86/boot/64: Extract efi_pe_entry() from startup_64() · 9e085cef
      Jiri Slaby 提交于
      Similarly to the 32-bit code, efi_pe_entry body() is somehow squashed into
      startup_64().
      
      In the old days, we forced startup_64() to start at offset 0x200 and efi_pe_entry()
      to start at 0x210. But this requirement was removed long time ago, in:
      
        99f857db ("x86, build: Dynamically find entry points in compressed startup code")
      
      The way it is now makes the code less readable and illogical. Given
      we can now safely extract the inlined efi_pe_entry() body from
      startup_64() into a separate function, we do so.
      
      We also annotate the function appropriatelly by ENTRY+ENDPROC.
      
      ABI offsets are preserved:
      
        0000000000000000 T startup_32
        0000000000000200 T startup_64
        0000000000000390 T efi64_stub_entry
      
      On the top-level, it looked like:
      
      	.org 0x200
      	ENTRY(startup_64)
      	#ifdef CONFIG_EFI_STUB		; start of inlined
      		jmp     preferred_addr
      	GLOBAL(efi_pe_entry)
      		... ; a lot of assembly (efi_pe_entry)
      		leaq    preferred_addr(%rax), %rax
      		jmp     *%rax
      	preferred_addr:
      	#endif				; end of inlined
      		... ; a lot of assembly (startup_64)
      	ENDPROC(startup_64)
      
      And it is now converted into:
      
      	.org 0x200
      	ENTRY(startup_64)
      		... ; a lot of assembly (startup_64)
      	ENDPROC(startup_64)
      
      	#ifdef CONFIG_EFI_STUB
      	ENTRY(efi_pe_entry)
      		... ; a lot of assembly (efi_pe_entry)
      		leaq    startup_64(%rax), %rax
      		jmp     *%rax
      	ENDPROC(efi_pe_entry)
      	#endif
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: ard.biesheuvel@linaro.org
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170824073327.4129-2-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9e085cef
    • J
      x86/boot/32: Extract efi_pe_entry() from startup_32() · f4dee0bb
      Jiri Slaby 提交于
      The efi_pe_entry() body is somehow squashed into startup_32(). In the old days,
      we forced startup_32() to start at offset 0x00 and efi_pe_entry() to start
      at 0x10.
      
      But this requirement was removed long time ago, in:
      
        99f857db ("x86, build: Dynamically find entry points in compressed startup code")
      
      The way it is now makes the code less readable and illogical. Given
      we can now safely extract the inlined efi_pe_entry() body from
      startup_32() into a separate function, we do so and we separate it to two
      functions as they are marked already: efi_pe_entry() + efi32_stub_entry().
      
      We also annotate the functions appropriatelly by ENTRY+ENDPROC.
      
      ABI offset is preserved:
      
        0000   128 FUNC    GLOBAL DEFAULT    6 startup_32
        0080    60 FUNC    GLOBAL DEFAULT    6 efi_pe_entry
        00bc    68 FUNC    GLOBAL DEFAULT    6 efi32_stub_entry
      
      On the top-level, it looked like this:
      
      	ENTRY(startup_32)
      	#ifdef CONFIG_EFI_STUB		; start of inlined
      		jmp     preferred_addr
      	ENTRY(efi_pe_entry)
      		... ; a lot of assembly (efi_pe_entry)
      	ENTRY(efi32_stub_entry)
      		... ; a lot of assembly (efi32_stub_entry)
      		leal    preferred_addr(%eax), %eax
      		jmp     *%eax
      	preferred_addr:
      	#endif				; end of inlined
      		... ; a lot of assembly (startup_32)
      	ENDPROC(startup_32)
      
      And it is now converted into:
      
      	ENTRY(startup_32)
      		... ; a lot of assembly (startup_32)
      	ENDPROC(startup_32)
      
      	#ifdef CONFIG_EFI_STUB
      	ENTRY(efi_pe_entry)
      		... ; a lot of assembly (efi_pe_entry)
      	ENDPROC(efi_pe_entry)
      
      	ENTRY(efi32_stub_entry)
      		... ; a lot of assembly (efi32_stub_entry)
      		leal    startup_32(%eax), %eax
      		jmp     *%eax
      	ENDPROC(efi32_stub_entry)
      	#endif
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: ard.biesheuvel@linaro.org
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170824073327.4129-1-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f4dee0bb
    • D
      x86/ldt: Fix off by one in get_segment_base() · eaa2f87c
      Dan Carpenter 提交于
      ldt->entries[] is allocated in alloc_ldt_struct().  It has
      ldt->nr_entries elements and ldt->nr_entries is capped at LDT_ENTRIES.
      So if "idx" is == ldt->nr_entries then we're reading beyond the end of
      the buffer.  It seems duplicative to have two limit checks when one
      would work just as well so I removed the check against LDT_ENTRIES.
      
      The gdt_page.gdt[] array has GDT_ENTRIES entries.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-janitors@vger.kernel.org
      Fixes: d07bdfd3 ("perf/x86: Fix USER/KERNEL tagging of samples properly")
      Link: http://lkml.kernel.org/r/20170818102516.gqwm4xdvvuvjw5ho@mwandaSigned-off-by: NIngo Molnar <mingo@kernel.org>
      eaa2f87c
  3. 25 8月, 2017 9 次提交
    • A
      perf/x86: Export some PMU attributes in caps/ directory · b00233b5
      Andi Kleen 提交于
      It can be difficult to figure out for user programs what features
      the x86 CPU PMU driver actually supports. Currently it requires
      grepping in dmesg, but dmesg is not always available.
      
      This adds a caps directory to /sys/bus/event_source/devices/cpu/,
      similar to the caps already used on intel_pt, which can be used to
      discover the available capabilities cleanly.
      
      Three capabilities are defined:
      
       - pmu_name:	Underlying CPU name known to the driver
       - max_precise:	Max precise level supported
       - branches:	Known depth of LBR.
      
      Example:
      
        % grep . /sys/bus/event_source/devices/cpu/caps/*
        /sys/bus/event_source/devices/cpu/caps/branches:32
        /sys/bus/event_source/devices/cpu/caps/max_precise:3
        /sys/bus/event_source/devices/cpu/caps/pmu_name:skylake
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170822185201.9261-3-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b00233b5
    • A
      perf/x86: Only show format attributes when supported · a5df70c3
      Andi Kleen 提交于
      Only show the Intel format attributes in sysfs when the feature is actually
      supported with the current model numbers. This allows programs to probe
      what format attributes are available, and give a sensible error message
      to users if they are not.
      
      This handles near all cases for intel attributes since Nehalem,
      except the (obscure) case when the model number if known, but PEBS
      is disabled in PERF_CAPABILITIES.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170822185201.9261-2-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a5df70c3
    • A
      perf/x86: Fix data source decoding for Skylake · 6ae5fa61
      Andi Kleen 提交于
      Skylake changed the encoding of the PEBS data source field.
      Some combinations are not available anymore, but some new cases
      e.g. for L4 cache hit are added.
      
      Fix up the conversion table for Skylake, similar as had been done
      for Nehalem.
      
      On Skylake server the encoding for L4 actually means persistent
      memory. Handle this case too.
      
      To properly describe it in the abstracted perf format I had to add
      some new fields. Since a hit can have only one level add a new
      field that is an enumeration, not a bit field to describe
      the level. It can describe any level. Some numbers are also
      used to describe PMEM and LFB.
      
      Also add a new generic remote flag that can be combined with
      the generic level to signify a remote cache.
      
      And there is an extension field for the snoop indication to handle
      the Forward state.
      
      I didn't add a generic flag for hops because it's not needed
      for Skylake.
      
      I changed the existing encodings for older CPUs to also fill in the
      new level and remote fields.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: http://lkml.kernel.org/r/20170816222156.19953-3-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6ae5fa61
    • A
      perf/x86: Move Nehalem PEBS code to flag · 95298355
      Andi Kleen 提交于
      Minor cleanup: use an explicit x86_pmu flag to handle the
      missing Lock / TLB information on Nehalem, instead of always
      checking the model number for each PEBS sample.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: http://lkml.kernel.org/r/20170816222156.19953-2-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      95298355
    • E
      x86/mm: Fix use-after-free of ldt_struct · ccd5b323
      Eric Biggers 提交于
      The following commit:
      
        39a0526f ("x86/mm: Factor out LDT init from context init")
      
      renamed init_new_context() to init_new_context_ldt() and added a new
      init_new_context() which calls init_new_context_ldt().  However, the
      error code of init_new_context_ldt() was ignored.  Consequently, if a
      memory allocation in alloc_ldt_struct() failed during a fork(), the
      ->context.ldt of the new task remained the same as that of the old task
      (due to the memcpy() in dup_mm()).  ldt_struct's are not intended to be
      shared, so a use-after-free occurred after one task exited.
      
      Fix the bug by making init_new_context() pass through the error code of
      init_new_context_ldt().
      
      This bug was found by syzkaller, which encountered the following splat:
      
          BUG: KASAN: use-after-free in free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
          Read of size 4 at addr ffff88006d2cb7c8 by task kworker/u9:0/3710
      
          CPU: 1 PID: 3710 Comm: kworker/u9:0 Not tainted 4.13.0-rc4-next-20170811 #2
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Call Trace:
           __dump_stack lib/dump_stack.c:16 [inline]
           dump_stack+0x194/0x257 lib/dump_stack.c:52
           print_address_description+0x73/0x250 mm/kasan/report.c:252
           kasan_report_error mm/kasan/report.c:351 [inline]
           kasan_report+0x24e/0x340 mm/kasan/report.c:409
           __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429
           free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
           free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
           destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
           destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
           __mmdrop+0xe9/0x530 kernel/fork.c:889
           mmdrop include/linux/sched/mm.h:42 [inline]
           exec_mmap fs/exec.c:1061 [inline]
           flush_old_exec+0x173c/0x1ff0 fs/exec.c:1291
           load_elf_binary+0x81f/0x4ba0 fs/binfmt_elf.c:855
           search_binary_handler+0x142/0x6b0 fs/exec.c:1652
           exec_binprm fs/exec.c:1694 [inline]
           do_execveat_common.isra.33+0x1746/0x22e0 fs/exec.c:1816
           do_execve+0x31/0x40 fs/exec.c:1860
           call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100
           ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
      
          Allocated by task 3700:
           save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
           save_stack+0x43/0xd0 mm/kasan/kasan.c:447
           set_track mm/kasan/kasan.c:459 [inline]
           kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
           kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3627
           kmalloc include/linux/slab.h:493 [inline]
           alloc_ldt_struct+0x52/0x140 arch/x86/kernel/ldt.c:67
           write_ldt+0x7b7/0xab0 arch/x86/kernel/ldt.c:277
           sys_modify_ldt+0x1ef/0x240 arch/x86/kernel/ldt.c:307
           entry_SYSCALL_64_fastpath+0x1f/0xbe
      
          Freed by task 3700:
           save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
           save_stack+0x43/0xd0 mm/kasan/kasan.c:447
           set_track mm/kasan/kasan.c:459 [inline]
           kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
           __cache_free mm/slab.c:3503 [inline]
           kfree+0xca/0x250 mm/slab.c:3820
           free_ldt_struct.part.2+0xdd/0x150 arch/x86/kernel/ldt.c:121
           free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
           destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
           destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
           __mmdrop+0xe9/0x530 kernel/fork.c:889
           mmdrop include/linux/sched/mm.h:42 [inline]
           __mmput kernel/fork.c:916 [inline]
           mmput+0x541/0x6e0 kernel/fork.c:927
           copy_process.part.36+0x22e1/0x4af0 kernel/fork.c:1931
           copy_process kernel/fork.c:1546 [inline]
           _do_fork+0x1ef/0xfb0 kernel/fork.c:2025
           SYSC_clone kernel/fork.c:2135 [inline]
           SyS_clone+0x37/0x50 kernel/fork.c:2129
           do_syscall_64+0x26c/0x8c0 arch/x86/entry/common.c:287
           return_from_SYSCALL_64+0x0/0x7a
      
      Here is a C reproducer:
      
          #include <asm/ldt.h>
          #include <pthread.h>
          #include <signal.h>
          #include <stdlib.h>
          #include <sys/syscall.h>
          #include <sys/wait.h>
          #include <unistd.h>
      
          static void *fork_thread(void *_arg)
          {
              fork();
          }
      
          int main(void)
          {
              struct user_desc desc = { .entry_number = 8191 };
      
              syscall(__NR_modify_ldt, 1, &desc, sizeof(desc));
      
              for (;;) {
                  if (fork() == 0) {
                      pthread_t t;
      
                      srand(getpid());
                      pthread_create(&t, NULL, fork_thread, NULL);
                      usleep(rand() % 10000);
                      syscall(__NR_exit_group, 0);
                  }
                  wait(NULL);
              }
          }
      
      Note: the reproducer takes advantage of the fact that alloc_ldt_struct()
      may use vmalloc() to allocate a large ->entries array, and after
      commit:
      
        5d17a73a ("vmalloc: back off when the current task is killed")
      
      it is possible for userspace to fail a task's vmalloc() by
      sending a fatal signal, e.g. via exit_group().  It would be more
      difficult to reproduce this bug on kernels without that commit.
      
      This bug only affected kernels with CONFIG_MODIFY_LDT_SYSCALL=y.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Acked-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: <stable@vger.kernel.org> [v4.6+]
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Fixes: 39a0526f ("x86/mm: Factor out LDT init from context init")
      Link: http://lkml.kernel.org/r/20170824175029.76040-1-ebiggers3@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ccd5b323
    • P
      KVM, pkeys: do not use PKRU value in vcpu->arch.guest_fpu.state · 38cfd5e3
      Paolo Bonzini 提交于
      The host pkru is restored right after vcpu exit (commit 1be0e61c), so
      KVM_GET_XSAVE will return the host PKRU value instead.  Fix this by
      using the guest PKRU explicitly in fill_xsave and load_xsave.  This
      part is based on a patch by Junkang Fu.
      
      The host PKRU data may also not match the value in vcpu->arch.guest_fpu.state,
      because it could have been changed by userspace since the last time
      it was saved, so skip loading it in kvm_load_guest_fpu.
      Reported-by: NJunkang Fu <junkang.fjk@alibaba-inc.com>
      Cc: Yang Zhang <zy107165@alibaba-inc.com>
      Fixes: 1be0e61c
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      38cfd5e3
    • P
      KVM: x86: simplify handling of PKRU · b9dd21e1
      Paolo Bonzini 提交于
      Move it to struct kvm_arch_vcpu, replacing guest_pkru_valid with a
      simple comparison against the host value of the register.  The write of
      PKRU in addition can be skipped if the guest has not enabled the feature.
      Once we do this, we need not test OSPKE in the host anymore, because
      guest_CR4.PKE=1 implies host_CR4.PKE=1.
      
      The static PKU test is kept to elide the code on older CPUs.
      Suggested-by: NYang Zhang <zy107165@alibaba-inc.com>
      Fixes: 1be0e61c
      Cc: stable@vger.kernel.org
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b9dd21e1
    • P
      KVM: x86: block guest protection keys unless the host has them enabled · c469268c
      Paolo Bonzini 提交于
      If the host has protection keys disabled, we cannot read and write the
      guest PKRU---RDPKRU and WRPKRU fail with #GP(0) if CR4.PKE=0.  Block
      the PKU cpuid bit in that case.
      
      This ensures that guest_CR4.PKE=1 implies host_CR4.PKE=1.
      
      Fixes: 1be0e61c
      Cc: stable@vger.kernel.org
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c469268c
    • F
      um: Fix check for _xstate for older hosts · 2fb44600
      Florian Fainelli 提交于
      Commit 0a987645 ("um: Allow building and running on older
      hosts") attempted to check for PTRACE_{GET,SET}REGSET under the premise
      that these ptrace(2) parameters were directly linked with the presence
      of the _xstate structure.
      
      After Richard's commit 61e8d462 ("um: Correctly check for
      PTRACE_GETRESET/SETREGSET") which properly included linux/ptrace.h
      instead of asm/ptrace.h, we could get into the original build failure
      that I reported:
      
      arch/x86/um/user-offsets.c: In function 'foo':
      arch/x86/um/user-offsets.c:54: error: invalid application of 'sizeof' to
      incomplete type 'struct _xstate'
      
      On this particular host, we do have PTRACE_GETREGSET and
      PTRACE_SETREGSET defined in linux/ptrace.h, but not the structure
      _xstate that should be pulled from the following include chain: signal.h
      -> bits/sigcontext.h.
      
      Correctly fix this by checking for FP_XSTATE_MAGIC1 which is the correct
      way to see if struct _xstate is available or not on the host.
      
      Fixes: 61e8d462 ("um: Correctly check for PTRACE_GETRESET/SETREGSET")
      Fixes: 0a987645 ("um: Allow building and running on older hosts")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      2fb44600
  4. 24 8月, 2017 2 次提交
  5. 19 8月, 2017 2 次提交
  6. 18 8月, 2017 2 次提交
    • T
      kernel/watchdog: Prevent false positives with turbo modes · 7edaeb68
      Thomas Gleixner 提交于
      The hardlockup detector on x86 uses a performance counter based on unhalted
      CPU cycles and a periodic hrtimer. The hrtimer period is about 2/5 of the
      performance counter period, so the hrtimer should fire 2-3 times before the
      performance counter NMI fires. The NMI code checks whether the hrtimer
      fired since the last invocation. If not, it assumess a hard lockup.
      
      The calculation of those periods is based on the nominal CPU
      frequency. Turbo modes increase the CPU clock frequency and therefore
      shorten the period of the perf/NMI watchdog. With extreme Turbo-modes (3x
      nominal frequency) the perf/NMI period is shorter than the hrtimer period
      which leads to false positives.
      
      A simple fix would be to shorten the hrtimer period, but that comes with
      the side effect of more frequent hrtimer and softlockup thread wakeups,
      which is not desired.
      
      Implement a low pass filter, which checks the perf/NMI period against
      kernel time. If the perf/NMI fires before 4/5 of the watchdog period has
      elapsed then the event is ignored and postponed to the next perf/NMI.
      
      That solves the problem and avoids the overhead of shorter hrtimer periods
      and more frequent softlockup thread wakeups.
      
      Fixes: 58687acb ("lockup_detector: Combine nmi_watchdog and softlockup detector")
      Reported-and-tested-by: NKan Liang <Kan.liang@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: dzickus@redhat.com
      Cc: prarit@redhat.com
      Cc: ak@linux.intel.com
      Cc: babu.moger@oracle.com
      Cc: peterz@infradead.org
      Cc: eranian@google.com
      Cc: acme@redhat.com
      Cc: stable@vger.kernel.org
      Cc: atomlin@redhat.com
      Cc: akpm@linux-foundation.org
      Cc: torvalds@linux-foundation.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708150931310.1886@nanos
      7edaeb68
    • A
      x86: Constify attribute_group structures · 45bd07ad
      Arvind Yadav 提交于
      attribute_groups are not supposed to change at runtime and none of the
      groups is modified.
      
      Mark the non-const structs as const.
      
      [ tglx: Folded into one big patch ]
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: tony.luck@intel.com
      Cc: bp@alien8.de
      Link: http://lkml.kernel.org/r/1500550238-15655-2-git-send-email-arvind.yadav.cs@gmail.com
      45bd07ad
  7. 17 8月, 2017 3 次提交
  8. 15 8月, 2017 2 次提交
  9. 11 8月, 2017 6 次提交
  10. 10 8月, 2017 3 次提交