1. 01 9月, 2017 4 次提交
  2. 31 8月, 2017 2 次提交
  3. 30 8月, 2017 9 次提交
  4. 29 8月, 2017 13 次提交
    • J
      MIPS: Remove pt_regs adjustments in indirect syscall handler · 5af2ed36
      James Cowgill 提交于
      If a restartable syscall is called using the indirect o32 syscall
      handler - eg: syscall(__NR_waitid, ...), then it is possible for the
      incorrect arguments to be passed to the syscall after it has been
      restarted. This is because the syscall handler tries to shift all the
      registers down one place in pt_regs so that when the syscall is restarted,
      the "real" syscall is called instead. Unfortunately it only shifts the
      arguments passed in registers, not the arguments on the user stack. This
      causes the 4th argument to be duplicated when the syscall is restarted.
      
      Fix by removing all the pt_regs shifting so that the indirect syscall
      handler is called again when the syscall is restarted. The comment "some
      syscalls like execve get their arguments from struct pt_regs" is long
      out of date so this should now be safe.
      Signed-off-by: NJames Cowgill <James.Cowgill@imgtec.com>
      Reviewed-by: NJames Hogan <james.hogan@imgtec.com>
      Tested-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/15856/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      5af2ed36
    • J
      MIPS: seccomp: Fix indirect syscall args · 3d729dea
      James Hogan 提交于
      Since commit 669c4092 ("MIPS: Give __secure_computing() access to
      syscall arguments."), upon syscall entry when seccomp is enabled,
      syscall_trace_enter() passes a carefully prepared struct seccomp_data
      containing syscall arguments to __secure_computing(). Unfortunately it
      directly uses mips_get_syscall_arg() and fails to take into account the
      indirect O32 system calls (i.e. syscall(2)) which put the system call
      number in a0 and have the arguments shifted up by one entry.
      
      We can't just revert that commit as samples/bpf/tracex5 would break
      again, so use syscall_get_arguments() which already takes indirect
      syscalls into account instead of directly using mips_get_syscall_arg(),
      similar to what populate_seccomp_data() does.
      
      This also removes the redundant error checking of the
      mips_get_syscall_arg() return value (get_user() already zeroes the
      result if an argument from the stack can't be loaded).
      Reported-by: NJames Cowgill <James.Cowgill@imgtec.com>
      Fixes: 669c4092 ("MIPS: Give __secure_computing() access to syscall arguments.")
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/16994/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      3d729dea
    • P
      perf/x86: Fix caps/ for !Intel · 5da382eb
      Peter Zijlstra 提交于
      Move the 'max_precise' capability into generic x86 code where it
      belongs. This fixes a sysfs splat on !Intel systems where we fail to set
      x86_pmu_caps_group.atts.
      Reported-and-tested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: hpa@zytor.com
      Fixes: 22688d1c20f5 ("x86/perf: Export some PMU attributes in caps/ directory")
      Link: http://lkml.kernel.org/r/20170828104650.2u3rsim4jafyjzv2@hirez.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5da382eb
    • K
      perf/core, x86: Add PERF_SAMPLE_PHYS_ADDR · fc7ce9c7
      Kan Liang 提交于
      For understanding how the workload maps to memory channels and hardware
      behavior, it's very important to collect address maps with physical
      addresses. For example, 3D XPoint access can only be found by filtering
      the physical address.
      
      Add a new sample type for physical address.
      
      perf already has a facility to collect data virtual address. This patch
      introduces a function to convert the virtual address to physical address.
      The function is quite generic and can be extended to any architecture as
      long as a virtual address is provided.
      
       - For kernel direct mapping addresses, virt_to_phys is used to convert
         the virtual addresses to physical address.
      
       - For user virtual addresses, __get_user_pages_fast is used to walk the
         pages tables for user physical address.
      
       - This does not work for vmalloc addresses right now. These are not
         resolved, but code to do that could be added.
      
      The new sample type requires collecting the virtual address. The
      virtual address will not be output unless SAMPLE_ADDR is applied.
      
      For security, the physical address can only be exposed to root or
      privileged user.
      Tested-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: mpe@ellerman.id.au
      Link: http://lkml.kernel.org/r/1503967969-48278-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fc7ce9c7
    • A
      perf/core, pt, bts: Get rid of itrace_started · 8d4e6c4c
      Alexander Shishkin 提交于
      I just noticed that hw.itrace_started and hw.config are aliased to the
      same location. Now, the PT driver happens to use both, which works out
      fine by sheer luck:
      
       - STORE(hw.itrace_start) is ordered before STORE(hw.config), in the
          program order, although there are no compiler barriers to ensure that,
      
       - to the perf_log_itrace_start() hw.itrace_start looks set at the same
         time as when it is intended to be set because both stores happen in the
         same path,
      
       - hw.config is never reset to zero in the PT driver.
      
      Now, the use of hw.config by the PT driver makes more sense (it being a
      HW PMU) than messing around with itrace_started, which is an awkward API
      to begin with.
      
      This patch replaces hw.itrace_started with an attach_state bit and an
      API call for the PMU drivers to use to communicate the condition.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20170330153956.25994-1-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8d4e6c4c
    • J
      x86/boot: Prevent faulty bootparams.screeninfo from causing harm · fb1cc2f9
      Jan H. Schönherr 提交于
      If a zero for the number of lines manages to slip through, scroll()
      may underflow some offset calculations, causing accesses outside the
      video memory.
      
      Make the check in __putstr() more pessimistic to prevent that.
      Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1503858223-14983-1-git-send-email-jschoenh@amazon.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fb1cc2f9
    • J
      x86/boot: Provide more slack space during decompression · 5746f055
      Jan H. Schönherr 提交于
      The current slack space is not enough for LZ4, which has a worst case
      overhead of 0.4% for data that cannot be further compressed. With
      an LZ4 compressed kernel with an embedded initrd, the output is likely
      to overwrite the input.
      
      Increase the slack space to avoid that.
      Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1503842124-29718-1-git-send-email-jschoenh@amazon.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5746f055
    • J
      x86/entry/64: Use ENTRY() instead of ALIGN+GLOBAL for stub32_clone() · 49993489
      Jiri Slaby 提交于
      ALIGN+GLOBAL is effectively what ENTRY() does, so use ENTRY() which is
      dedicated for exactly this purpose -- global functions.
      
      Note that stub32_clone() is a C-like leaf function -- it has a standard
      call frame -- it only switches one argument and continues by jumping
      into C. Since each ENTRY() should be balanced by some END*() marker, we
      add a corresponding ENDPROC() to stub32_clone() too.
      
      Besides that, x86's custom GLOBAL macro is going to die very soon.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170824080624.7768-2-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
      49993489
    • J
      x86/fpu/math-emu: Add ENDPROC to functions · bd6be579
      Jiri Slaby 提交于
      Functions in math-emu are annotated as ENTRY() symbols, but their
      ends are not annotated at all. But these are standard functions
      called from C, with proper stack register update etc.
      
      Omitting the ends means:
      
        * the annotations are not paired and we cannot deal with such functions
          e.g. in objtool
      
        * the symbols are not marked as functions in the object file
      
        * there are no sizes of the functions in the object file
      
      So fix this by adding ENDPROC() to each such case in math-emu.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170824080624.7768-1-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bd6be579
    • J
      x86/boot/64: Extract efi_pe_entry() from startup_64() · 9e085cef
      Jiri Slaby 提交于
      Similarly to the 32-bit code, efi_pe_entry body() is somehow squashed into
      startup_64().
      
      In the old days, we forced startup_64() to start at offset 0x200 and efi_pe_entry()
      to start at 0x210. But this requirement was removed long time ago, in:
      
        99f857db ("x86, build: Dynamically find entry points in compressed startup code")
      
      The way it is now makes the code less readable and illogical. Given
      we can now safely extract the inlined efi_pe_entry() body from
      startup_64() into a separate function, we do so.
      
      We also annotate the function appropriatelly by ENTRY+ENDPROC.
      
      ABI offsets are preserved:
      
        0000000000000000 T startup_32
        0000000000000200 T startup_64
        0000000000000390 T efi64_stub_entry
      
      On the top-level, it looked like:
      
      	.org 0x200
      	ENTRY(startup_64)
      	#ifdef CONFIG_EFI_STUB		; start of inlined
      		jmp     preferred_addr
      	GLOBAL(efi_pe_entry)
      		... ; a lot of assembly (efi_pe_entry)
      		leaq    preferred_addr(%rax), %rax
      		jmp     *%rax
      	preferred_addr:
      	#endif				; end of inlined
      		... ; a lot of assembly (startup_64)
      	ENDPROC(startup_64)
      
      And it is now converted into:
      
      	.org 0x200
      	ENTRY(startup_64)
      		... ; a lot of assembly (startup_64)
      	ENDPROC(startup_64)
      
      	#ifdef CONFIG_EFI_STUB
      	ENTRY(efi_pe_entry)
      		... ; a lot of assembly (efi_pe_entry)
      		leaq    startup_64(%rax), %rax
      		jmp     *%rax
      	ENDPROC(efi_pe_entry)
      	#endif
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: ard.biesheuvel@linaro.org
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170824073327.4129-2-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9e085cef
    • J
      x86/boot/32: Extract efi_pe_entry() from startup_32() · f4dee0bb
      Jiri Slaby 提交于
      The efi_pe_entry() body is somehow squashed into startup_32(). In the old days,
      we forced startup_32() to start at offset 0x00 and efi_pe_entry() to start
      at 0x10.
      
      But this requirement was removed long time ago, in:
      
        99f857db ("x86, build: Dynamically find entry points in compressed startup code")
      
      The way it is now makes the code less readable and illogical. Given
      we can now safely extract the inlined efi_pe_entry() body from
      startup_32() into a separate function, we do so and we separate it to two
      functions as they are marked already: efi_pe_entry() + efi32_stub_entry().
      
      We also annotate the functions appropriatelly by ENTRY+ENDPROC.
      
      ABI offset is preserved:
      
        0000   128 FUNC    GLOBAL DEFAULT    6 startup_32
        0080    60 FUNC    GLOBAL DEFAULT    6 efi_pe_entry
        00bc    68 FUNC    GLOBAL DEFAULT    6 efi32_stub_entry
      
      On the top-level, it looked like this:
      
      	ENTRY(startup_32)
      	#ifdef CONFIG_EFI_STUB		; start of inlined
      		jmp     preferred_addr
      	ENTRY(efi_pe_entry)
      		... ; a lot of assembly (efi_pe_entry)
      	ENTRY(efi32_stub_entry)
      		... ; a lot of assembly (efi32_stub_entry)
      		leal    preferred_addr(%eax), %eax
      		jmp     *%eax
      	preferred_addr:
      	#endif				; end of inlined
      		... ; a lot of assembly (startup_32)
      	ENDPROC(startup_32)
      
      And it is now converted into:
      
      	ENTRY(startup_32)
      		... ; a lot of assembly (startup_32)
      	ENDPROC(startup_32)
      
      	#ifdef CONFIG_EFI_STUB
      	ENTRY(efi_pe_entry)
      		... ; a lot of assembly (efi_pe_entry)
      	ENDPROC(efi_pe_entry)
      
      	ENTRY(efi32_stub_entry)
      		... ; a lot of assembly (efi32_stub_entry)
      		leal    startup_32(%eax), %eax
      		jmp     *%eax
      	ENDPROC(efi32_stub_entry)
      	#endif
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: ard.biesheuvel@linaro.org
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170824073327.4129-1-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f4dee0bb
    • D
      x86/ldt: Fix off by one in get_segment_base() · eaa2f87c
      Dan Carpenter 提交于
      ldt->entries[] is allocated in alloc_ldt_struct().  It has
      ldt->nr_entries elements and ldt->nr_entries is capped at LDT_ENTRIES.
      So if "idx" is == ldt->nr_entries then we're reading beyond the end of
      the buffer.  It seems duplicative to have two limit checks when one
      would work just as well so I removed the check against LDT_ENTRIES.
      
      The gdt_page.gdt[] array has GDT_ENTRIES entries.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-janitors@vger.kernel.org
      Fixes: d07bdfd3 ("perf/x86: Fix USER/KERNEL tagging of samples properly")
      Link: http://lkml.kernel.org/r/20170818102516.gqwm4xdvvuvjw5ho@mwandaSigned-off-by: NIngo Molnar <mingo@kernel.org>
      eaa2f87c
    • A
      ARCv2: SMP: Mask only private-per-core IRQ lines on boot at core intc · e8206d2b
      Alexey Brodkin 提交于
      Recent commit a8ec3ee8 "arc: Mask individual IRQ lines during core
      INTC init" breaks interrupt handling on ARCv2 SMP systems.
      
      That commit masked all interrupts at onset, as some controllers on some
      boards (customer as well as internal), would assert interrutps early
      before any handlers were installed.  For SMP systems, the masking was
      done at each cpu's core-intc.  Later, when the IRQ was actually
      requested, it was unmasked, but only on the requesting cpu.
      
      For "common" interrupts, which were wired up from the 2nd level IDU
      intc, this was as issue as they needed to be enabled on ALL the cpus
      (given that IDU IRQs are by default served Round Robin across cpus)
      
      So fix that by NOT masking "common" interrupts at core-intc, but instead
      at the 2nd level IDU intc (latter already being done in idu_of_init())
      
      Fixes: a8ec3ee8 ("arc: Mask individual IRQ lines during core INTC init")
      Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
      [vgupta: reworked changelog, removed the extraneous idu_irq_mask_raw()]
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8206d2b
  5. 28 8月, 2017 2 次提交
  6. 25 8月, 2017 10 次提交
    • P
      KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce() · 47c5310a
      Paul Mackerras 提交于
      Nixiaoming pointed out that there is a memory leak in
      kvm_vm_ioctl_create_spapr_tce() if the call to anon_inode_getfd()
      fails; the memory allocated for the kvmppc_spapr_tce_table struct
      is not freed, and nor are the pages allocated for the iommu
      tables.  In addition, we have already incremented the process's
      count of locked memory pages, and this doesn't get restored on
      error.
      
      David Hildenbrand pointed out that there is a race in that the
      function checks early on that there is not already an entry in the
      stt->iommu_tables list with the same LIOBN, but an entry with the
      same LIOBN could get added between then and when the new entry is
      added to the list.
      
      This fixes all three problems.  To simplify things, we now call
      anon_inode_getfd() before placing the new entry in the list.  The
      check for an existing entry is done while holding the kvm->lock
      mutex, immediately before adding the new entry to the list.
      Finally, on failure we now call kvmppc_account_memlimit to
      decrement the process's count of locked memory pages.
      Reported-by: NNixiaoming <nixiaoming@huawei.com>
      Reported-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      47c5310a
    • A
      perf/x86: Export some PMU attributes in caps/ directory · b00233b5
      Andi Kleen 提交于
      It can be difficult to figure out for user programs what features
      the x86 CPU PMU driver actually supports. Currently it requires
      grepping in dmesg, but dmesg is not always available.
      
      This adds a caps directory to /sys/bus/event_source/devices/cpu/,
      similar to the caps already used on intel_pt, which can be used to
      discover the available capabilities cleanly.
      
      Three capabilities are defined:
      
       - pmu_name:	Underlying CPU name known to the driver
       - max_precise:	Max precise level supported
       - branches:	Known depth of LBR.
      
      Example:
      
        % grep . /sys/bus/event_source/devices/cpu/caps/*
        /sys/bus/event_source/devices/cpu/caps/branches:32
        /sys/bus/event_source/devices/cpu/caps/max_precise:3
        /sys/bus/event_source/devices/cpu/caps/pmu_name:skylake
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170822185201.9261-3-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b00233b5
    • A
      perf/x86: Only show format attributes when supported · a5df70c3
      Andi Kleen 提交于
      Only show the Intel format attributes in sysfs when the feature is actually
      supported with the current model numbers. This allows programs to probe
      what format attributes are available, and give a sensible error message
      to users if they are not.
      
      This handles near all cases for intel attributes since Nehalem,
      except the (obscure) case when the model number if known, but PEBS
      is disabled in PERF_CAPABILITIES.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170822185201.9261-2-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a5df70c3
    • A
      perf/x86: Fix data source decoding for Skylake · 6ae5fa61
      Andi Kleen 提交于
      Skylake changed the encoding of the PEBS data source field.
      Some combinations are not available anymore, but some new cases
      e.g. for L4 cache hit are added.
      
      Fix up the conversion table for Skylake, similar as had been done
      for Nehalem.
      
      On Skylake server the encoding for L4 actually means persistent
      memory. Handle this case too.
      
      To properly describe it in the abstracted perf format I had to add
      some new fields. Since a hit can have only one level add a new
      field that is an enumeration, not a bit field to describe
      the level. It can describe any level. Some numbers are also
      used to describe PMEM and LFB.
      
      Also add a new generic remote flag that can be combined with
      the generic level to signify a remote cache.
      
      And there is an extension field for the snoop indication to handle
      the Forward state.
      
      I didn't add a generic flag for hops because it's not needed
      for Skylake.
      
      I changed the existing encodings for older CPUs to also fill in the
      new level and remote fields.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: http://lkml.kernel.org/r/20170816222156.19953-3-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6ae5fa61
    • A
      perf/x86: Move Nehalem PEBS code to flag · 95298355
      Andi Kleen 提交于
      Minor cleanup: use an explicit x86_pmu flag to handle the
      missing Lock / TLB information on Nehalem, instead of always
      checking the model number for each PEBS sample.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: http://lkml.kernel.org/r/20170816222156.19953-2-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      95298355
    • E
      x86/mm: Fix use-after-free of ldt_struct · ccd5b323
      Eric Biggers 提交于
      The following commit:
      
        39a0526f ("x86/mm: Factor out LDT init from context init")
      
      renamed init_new_context() to init_new_context_ldt() and added a new
      init_new_context() which calls init_new_context_ldt().  However, the
      error code of init_new_context_ldt() was ignored.  Consequently, if a
      memory allocation in alloc_ldt_struct() failed during a fork(), the
      ->context.ldt of the new task remained the same as that of the old task
      (due to the memcpy() in dup_mm()).  ldt_struct's are not intended to be
      shared, so a use-after-free occurred after one task exited.
      
      Fix the bug by making init_new_context() pass through the error code of
      init_new_context_ldt().
      
      This bug was found by syzkaller, which encountered the following splat:
      
          BUG: KASAN: use-after-free in free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
          Read of size 4 at addr ffff88006d2cb7c8 by task kworker/u9:0/3710
      
          CPU: 1 PID: 3710 Comm: kworker/u9:0 Not tainted 4.13.0-rc4-next-20170811 #2
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Call Trace:
           __dump_stack lib/dump_stack.c:16 [inline]
           dump_stack+0x194/0x257 lib/dump_stack.c:52
           print_address_description+0x73/0x250 mm/kasan/report.c:252
           kasan_report_error mm/kasan/report.c:351 [inline]
           kasan_report+0x24e/0x340 mm/kasan/report.c:409
           __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429
           free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
           free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
           destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
           destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
           __mmdrop+0xe9/0x530 kernel/fork.c:889
           mmdrop include/linux/sched/mm.h:42 [inline]
           exec_mmap fs/exec.c:1061 [inline]
           flush_old_exec+0x173c/0x1ff0 fs/exec.c:1291
           load_elf_binary+0x81f/0x4ba0 fs/binfmt_elf.c:855
           search_binary_handler+0x142/0x6b0 fs/exec.c:1652
           exec_binprm fs/exec.c:1694 [inline]
           do_execveat_common.isra.33+0x1746/0x22e0 fs/exec.c:1816
           do_execve+0x31/0x40 fs/exec.c:1860
           call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100
           ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
      
          Allocated by task 3700:
           save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
           save_stack+0x43/0xd0 mm/kasan/kasan.c:447
           set_track mm/kasan/kasan.c:459 [inline]
           kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
           kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3627
           kmalloc include/linux/slab.h:493 [inline]
           alloc_ldt_struct+0x52/0x140 arch/x86/kernel/ldt.c:67
           write_ldt+0x7b7/0xab0 arch/x86/kernel/ldt.c:277
           sys_modify_ldt+0x1ef/0x240 arch/x86/kernel/ldt.c:307
           entry_SYSCALL_64_fastpath+0x1f/0xbe
      
          Freed by task 3700:
           save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
           save_stack+0x43/0xd0 mm/kasan/kasan.c:447
           set_track mm/kasan/kasan.c:459 [inline]
           kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
           __cache_free mm/slab.c:3503 [inline]
           kfree+0xca/0x250 mm/slab.c:3820
           free_ldt_struct.part.2+0xdd/0x150 arch/x86/kernel/ldt.c:121
           free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
           destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
           destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
           __mmdrop+0xe9/0x530 kernel/fork.c:889
           mmdrop include/linux/sched/mm.h:42 [inline]
           __mmput kernel/fork.c:916 [inline]
           mmput+0x541/0x6e0 kernel/fork.c:927
           copy_process.part.36+0x22e1/0x4af0 kernel/fork.c:1931
           copy_process kernel/fork.c:1546 [inline]
           _do_fork+0x1ef/0xfb0 kernel/fork.c:2025
           SYSC_clone kernel/fork.c:2135 [inline]
           SyS_clone+0x37/0x50 kernel/fork.c:2129
           do_syscall_64+0x26c/0x8c0 arch/x86/entry/common.c:287
           return_from_SYSCALL_64+0x0/0x7a
      
      Here is a C reproducer:
      
          #include <asm/ldt.h>
          #include <pthread.h>
          #include <signal.h>
          #include <stdlib.h>
          #include <sys/syscall.h>
          #include <sys/wait.h>
          #include <unistd.h>
      
          static void *fork_thread(void *_arg)
          {
              fork();
          }
      
          int main(void)
          {
              struct user_desc desc = { .entry_number = 8191 };
      
              syscall(__NR_modify_ldt, 1, &desc, sizeof(desc));
      
              for (;;) {
                  if (fork() == 0) {
                      pthread_t t;
      
                      srand(getpid());
                      pthread_create(&t, NULL, fork_thread, NULL);
                      usleep(rand() % 10000);
                      syscall(__NR_exit_group, 0);
                  }
                  wait(NULL);
              }
          }
      
      Note: the reproducer takes advantage of the fact that alloc_ldt_struct()
      may use vmalloc() to allocate a large ->entries array, and after
      commit:
      
        5d17a73a ("vmalloc: back off when the current task is killed")
      
      it is possible for userspace to fail a task's vmalloc() by
      sending a fatal signal, e.g. via exit_group().  It would be more
      difficult to reproduce this bug on kernels without that commit.
      
      This bug only affected kernels with CONFIG_MODIFY_LDT_SYSCALL=y.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Acked-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: <stable@vger.kernel.org> [v4.6+]
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Fixes: 39a0526f ("x86/mm: Factor out LDT init from context init")
      Link: http://lkml.kernel.org/r/20170824175029.76040-1-ebiggers3@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ccd5b323
    • P
      KVM, pkeys: do not use PKRU value in vcpu->arch.guest_fpu.state · 38cfd5e3
      Paolo Bonzini 提交于
      The host pkru is restored right after vcpu exit (commit 1be0e61c), so
      KVM_GET_XSAVE will return the host PKRU value instead.  Fix this by
      using the guest PKRU explicitly in fill_xsave and load_xsave.  This
      part is based on a patch by Junkang Fu.
      
      The host PKRU data may also not match the value in vcpu->arch.guest_fpu.state,
      because it could have been changed by userspace since the last time
      it was saved, so skip loading it in kvm_load_guest_fpu.
      Reported-by: NJunkang Fu <junkang.fjk@alibaba-inc.com>
      Cc: Yang Zhang <zy107165@alibaba-inc.com>
      Fixes: 1be0e61c
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      38cfd5e3
    • P
      KVM: x86: simplify handling of PKRU · b9dd21e1
      Paolo Bonzini 提交于
      Move it to struct kvm_arch_vcpu, replacing guest_pkru_valid with a
      simple comparison against the host value of the register.  The write of
      PKRU in addition can be skipped if the guest has not enabled the feature.
      Once we do this, we need not test OSPKE in the host anymore, because
      guest_CR4.PKE=1 implies host_CR4.PKE=1.
      
      The static PKU test is kept to elide the code on older CPUs.
      Suggested-by: NYang Zhang <zy107165@alibaba-inc.com>
      Fixes: 1be0e61c
      Cc: stable@vger.kernel.org
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b9dd21e1
    • P
      KVM: x86: block guest protection keys unless the host has them enabled · c469268c
      Paolo Bonzini 提交于
      If the host has protection keys disabled, we cannot read and write the
      guest PKRU---RDPKRU and WRPKRU fail with #GP(0) if CR4.PKE=0.  Block
      the PKU cpuid bit in that case.
      
      This ensures that guest_CR4.PKE=1 implies host_CR4.PKE=1.
      
      Fixes: 1be0e61c
      Cc: stable@vger.kernel.org
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c469268c
    • A
      arm64: dts: exynos: remove i80-if-timings nodes · 88a5e22a
      Andrzej Hajda 提交于
      Since i80/command mode is determined in runtime by propagating info
      from panel this property can be removed.
      Signed-off-by: NAndrzej Hajda <a.hajda@samsung.com>
      Signed-off-by: NInki Dae <inki.dae@samsung.com>
      88a5e22a