1. 05 5月, 2016 6 次提交
  2. 28 4月, 2016 4 次提交
    • K
      perf/x86/intel: Fix incorrect lbr_sel_mask value · cf3beb7c
      Kan Liang 提交于
      This patch fixes a bug which was introduced by:
      
       b16a5b52 ("perf/x86: Add option to disable reading branch flags/cycles")
      
      In this patch, lbr_sel_mask is used to mask the lbr_select. But LBR_SEL_MASK
      doesn't include the bit for LBR_CALL_STACK. So LBR call stack will never be
      set in lbr_select.
      
      This patch corrects the LBR_SEL_MASK by including all valid bits in
      LBR_SELECT. Also, the LBR_CALL_STACK bit is different as other bit in
      LBR_SELECT. It does not operate in suppress mode, so it needs to be
      specially handled in intel_pmu_setup_hw_lbr_filter.
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1461231010-4399-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cf3beb7c
    • A
      perf/x86/intel/pt: Don't die on VMXON · 1c5ac21a
      Alexander Shishkin 提交于
      Some versions of Intel PT do not support tracing across VMXON, more
      specifically, VMXON will clear TraceEn control bit and any attempt to
      set it before VMXOFF will throw a #GP, which in the current state of
      things will crash the kernel. Namely:
      
        $ perf record -e intel_pt// kvm -nographic
      
      on such a machine will kill it.
      
      To avoid this, notify the intel_pt driver before VMXON and after
      VMXOFF so that it knows when not to enable itself.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: hpa@zytor.com
      Link: http://lkml.kernel.org/r/87oa9dwrfk.fsf@ashishki-desk.ger.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1c5ac21a
    • A
      perf/x86/amd: Set the size of event map array to PERF_COUNT_HW_MAX · 0a25556f
      Adam Borowski 提交于
      The entry for PERF_COUNT_HW_REF_CPU_CYCLES is not used on AMD, but is
      referenced by filter_events() which expects undefined events to have a
      value of 0.
      
      Found via KASAN:
      
        UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:30
        index 9 is out of range for type 'u64 [9]'
        UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:9
        load of address ffffffff81c021c8 with insufficient space for an object of type 'const u64'
      Signed-off-by: NAdam Borowski <kilobyte@angband.pl>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1461749731-30979-1-git-send-email-kilobyte@angband.plSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0a25556f
    • K
      x86/apic: Handle zero vector gracefully in clear_vector_irq() · 1bdb8970
      Keith Busch 提交于
      If x86_vector_alloc_irq() fails x86_vector_free_irqs() is invoked to cleanup
      the already allocated vectors. This subsequently calls clear_vector_irq().
      
      The failed irq has no vector assigned, which triggers the BUG_ON(!vector) in
      clear_vector_irq().
      
      We cannot suppress the call to x86_vector_free_irqs() for the failed
      interrupt, because the other data related to this irq must be cleaned up as
      well. So calling clear_vector_irq() with vector == 0 is legitimate.
      
      Remove the BUG_ON and return if vector is zero,
      
      [ tglx: Massaged changelog ]
      
      Fixes: b5dc8e6c "x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors"
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      1bdb8970
  3. 27 4月, 2016 6 次提交
    • A
      perf core: Allow setting up max frame stack depth via sysctl · c5dfd78e
      Arnaldo Carvalho de Melo 提交于
      The default remains 127, which is good for most cases, and not even hit
      most of the time, but then for some cases, as reported by Brendan, 1024+
      deep frames are appearing on the radar for things like groovy, ruby.
      
      And in some workloads putting a _lower_ cap on this may make sense. One
      that is per event still needs to be put in place tho.
      
      The new file is:
      
        # cat /proc/sys/kernel/perf_event_max_stack
        127
      
      Chaging it:
      
        # echo 256 > /proc/sys/kernel/perf_event_max_stack
        # cat /proc/sys/kernel/perf_event_max_stack
        256
      
      But as soon as there is some event using callchains we get:
      
        # echo 512 > /proc/sys/kernel/perf_event_max_stack
        -bash: echo: write error: Device or resource busy
        #
      
      Because we only allocate the callchain percpu data structures when there
      is a user, which allows for changing the max easily, its just a matter
      of having no callchain users at that point.
      Reported-and-Tested-by: NBrendan Gregg <brendan.d.gregg@gmail.com>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/r/20160426002928.GB16708@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c5dfd78e
    • A
      ARC: add support for reserved memory defined by device tree · 1b10cb21
      Alexey Brodkin 提交于
      Enable reserved memory initialization from device tree.
      Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      1b10cb21
    • A
      ARC: support generic per-device coherent dma mem · 32ed9a0e
      Alexey Brodkin 提交于
      Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      32ed9a0e
    • R
      nios2: memset: use the right constraint modifier for the %4 output operand · a8950e49
      Romain Perier 提交于
      Depending on the size of the area to be memset'ed, the nios2 memset implementation
      either uses a naive loop (for buffers smaller or equal than 8 bytes) or a more optimized
      implementation (for buffers larger than 8 bytes). This implementation does 4-byte stores
      rather than 1-byte stores to speed up memset.
      
      However, we discovered that on our nios2 platform, memset() was not properly setting the
      buffer to the expected value. A memset of 0xff would not set the entire buffer to 0xff, but to:
      
      0xff 0x00 0xff 0x00 0xff 0x00 0xff 0x00 ...
      
      Which is obviously incorrect. Our investigation has revealed that the problem lies in the
      incorrect constraints used in the inline assembly.
      
      The following piece of assembly, from the nios2 memset implementation, is supposed to
      create a 4-byte value that repeats 4 times the 1-byte pattern passed as memset argument:
      
      /* fill8 %3, %5 (c & 0xff) */
      "       slli    %4, %5, 8\n"
      "       or      %4, %4, %5\n"
      "       slli    %3, %4, 16\n"
      "       or      %3, %3, %4\n"
      
      However, depending on the compiler and optimization level, this code might be compiled as:
      
      34:	280a923a 	slli	r5,r5,8
      38:	294ab03a 	or	r5,r5,r5
      3c:	2808943a 	slli	r4,r5,16
      40:	2148b03a 	or	r4,r4,r5
      
      This is wrong because r5 gets used both for %5 and %4, which leads to the final pattern
      stored in r4 to be 0xff00ff00 rather than the expected 0xffffffff.
      
      %4 is defined with the "=r" constraint, i.e as an output operand. However, as explained in
      http://www.ethernut.de/en/documents/arm-inline-asm.html, this does not prevent gcc from
      using the same register for an output operand (%4) and input operand (%5). By using the
      constraint modifier '&', we indicate that the register should be used for output only. With this
      change, we get the following assembly output:
      
      34:	2810923a 	slli	r8,r5,8
      38:	4150b03a 	or	r8,r8,r5
      3c:	400e943a 	slli	r7,r8,16
      40:	3a0eb03a 	or	r7,r7,r8
      
      Which correctly produces the 0xffffffff pattern when 0xff is passed as the memset() pattern.
      
      It is worth mentioning the observed consequence of this bug: we were hitting the kernel
      BUG() in mm/bootmem.c:__free() that verifies when marking a page as free that it was
      previously marked as occupied (i.e that the bit was set to 1). The entire bootmem bitmap is
      set to 0xff bit via a memset() during the bootmem initialization. The bootmem_free() call right
      after the initialization was finding some bits to be set to 0, which didn't make sense since the
      bitmap has just been memset'ed to 0xff. Except that due to the bug explained above, the
      bitmap was in fact initialized to 0xff00ff00.
      
      Thanks to Marek Vasut for his help and feedback.
      Signed-off-by: NRomain Perier <romain.perier@free-electrons.com>
      Acked-by: NMarek Vasut <marex@denx.de>
      Acked-by: NLey Foon Tan <lftan@altera.com>
      a8950e49
    • R
      powerpc: wire up preadv2 and pwritev2 syscalls · d701cca6
      Rui Salvaterra 提交于
      Wire up preadv2/pwritev2 in the same way as preadv/pwritev. Fixes two
      build warnings on ppc64.
      
      mpe: Lightly tested with fio (slightly hacked to add the syscall
      wrappers):
      
        fio-4217  [009] ....  1304.635300: sys_preadv2(fd: 3, vec:
        10025821de0, vlen: 1, pos_l: 6253000, pos_h: 0, flags: 1)
        fio-4217  [009] ....  1304.635474: sys_preadv2 -> 0x1000
      Signed-off-by: NRui Salvaterra <rsalvaterra@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d701cca6
    • A
      Revert "x86/mm/32: Set NX in __supported_pte_mask before enabling paging" · e16d8a6c
      Andy Lutomirski 提交于
      This reverts commit 320d25b6.
      
      This change was problematic for a couple of reasons:
      
      1. It missed a some entry points (Xen things and 64-bit native).
      
      2. The entry it changed can be executed more than once.  This isn't
         really a problem, but it conflated per-cpu state setup and global
         state setup.
      
      3. It broke 64-bit non-NX.  64-bit non-NX worked the other way around from
         32-bit -- __supported_pte_mask had NX set initially and was *cleared*
         in x86_configure_nx.  With the patch applied, it never got cleared.
      Reported-and-tested-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/59bd15f7f4b56b633a611b7f70876c6d2ad01a98.1461685884.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e16d8a6c
  4. 24 4月, 2016 1 次提交
  5. 23 4月, 2016 8 次提交
  6. 22 4月, 2016 3 次提交
    • E
      ARCv2: Enable LOCKDEP · d9676fa1
      Evgeny Voevodin 提交于
      - The asm helpers for calling into irq tracer were missing
      
      - Add calls to above helpers in low level assembly entry code for ARCv2
      
      - irq_save() uses CLRI to disable interrupts and returns the prev interrupt
        state (in STATUS32) in a specific encoding (and not the raw value of
        STATUS32). This is usable with SETI in irq_restore(). However
        save_flags() reads the raw value of STATUS32 which doesn't pair with
        irq_save/restore() and thus needs fixing.
      Signed-off-by: NEvgeny Voevodin <evgeny.voevodin@intel.com>
      [vgupta: updated changelog and also added some comments]
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      d9676fa1
    • J
      x86/mm/xen: Suppress hugetlbfs in PV guests · 103f6112
      Jan Beulich 提交于
      Huge pages are not normally available to PV guests. Not suppressing
      hugetlbfs use results in an endless loop of page faults when user mode
      code tries to access a hugetlbfs mapped area (since the hypervisor
      denies such PTEs to be created, but error indications can't be
      propagated out of xen_set_pte_at(), just like for various of its
      siblings), and - once killed in an oops like this:
      
        kernel BUG at .../fs/hugetlbfs/inode.c:428!
        invalid opcode: 0000 [#1] SMP
        ...
        RIP: e030:[<ffffffff811c333b>]  [<ffffffff811c333b>] remove_inode_hugepages+0x25b/0x320
        ...
        Call Trace:
         [<ffffffff811c3415>] hugetlbfs_evict_inode+0x15/0x40
         [<ffffffff81167b3d>] evict+0xbd/0x1b0
         [<ffffffff8116514a>] __dentry_kill+0x19a/0x1f0
         [<ffffffff81165b0e>] dput+0x1fe/0x220
         [<ffffffff81150535>] __fput+0x155/0x200
         [<ffffffff81079fc0>] task_work_run+0x60/0xa0
         [<ffffffff81063510>] do_exit+0x160/0x400
         [<ffffffff810637eb>] do_group_exit+0x3b/0xa0
         [<ffffffff8106e8bd>] get_signal+0x1ed/0x470
         [<ffffffff8100f854>] do_signal+0x14/0x110
         [<ffffffff810030e9>] prepare_exit_to_usermode+0xe9/0xf0
         [<ffffffff814178a5>] retint_user+0x8/0x13
      
      This is CVE-2016-3961 / XSA-174.
      Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Juergen Gross <JGross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: stable@vger.kernel.org
      Cc: xen-devel <xen-devel@lists.xenproject.org>
      Link: http://lkml.kernel.org/r/57188ED802000078000E431C@prv-mh.provo.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      103f6112
    • D
      arm64: Fix EL1/EL2 early init inconsistencies with VHE · 882416c1
      Dave Martin 提交于
      When using the Virtualisation Host Extensions, EL1 is not used in
      the host and requires no separate configuration.
      
      In addition, with VHE enabled, non-hyp-specific EL2 configuration
      that does not need to be done early will be done anyway in
      __cpu_setup via the _EL1 system register aliases.  In particular,
      the layout and definition of CPTR_EL2 are changed by enabling VHE
      so that they resemble CPACR_EL1, so existing code to initialise
      CPTR_EL2 becomes architecturally wrong in this case.
      
      This patch simply skips the affected initialisation code in the
      non-VHE case.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      882416c1
  7. 21 4月, 2016 2 次提交
    • G
      s390/mm: fix asce_bits handling with dynamic pagetable levels · 723cacbd
      Gerald Schaefer 提交于
      There is a race with multi-threaded applications between context switch and
      pagetable upgrade. In switch_mm() a new user_asce is built from mm->pgd and
      mm->context.asce_bits, w/o holding any locks. A concurrent mmap with a
      pagetable upgrade on another thread in crst_table_upgrade() could already
      have set new asce_bits, but not yet the new mm->pgd. This would result in a
      corrupt user_asce in switch_mm(), and eventually in a kernel panic from a
      translation exception.
      
      Fix this by storing the complete asce instead of just the asce_bits, which
      can then be read atomically from switch_mm(), so that it either sees the
      old value or the new value, but no mixture. Both cases are OK. Having the
      old value would result in a page fault on access to the higher level memory,
      but the fault handler would see the new mm->pgd, if it was a valid access
      after the mmap on the other thread has completed. So as worst-case scenario
      we would have a page fault loop for the racing thread until the next time
      slice.
      
      Also remove dead code and simplify the upgrade/downgrade path, there are no
      upgrades from 2 levels, and only downgrades from 3 levels for compat tasks.
      There are also no concurrent upgrades, because the mmap_sem is held with
      down_write() in do_mmap, so the flush and table checks during upgrade can
      be removed.
      Reported-by: NMichael Munday <munday@ca.ibm.com>
      Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      723cacbd
    • S
      s390/pci: fix use after free in dma_init · dba59909
      Sebastian Ott 提交于
      After a failure during registration of the dma_table (because of the
      function being in error state) we free its memory but don't reset the
      associated pointer to zero.
      
      When we then receive a notification from firmware (about the function
      being in error state) we'll try to walk and free the dma_table again.
      
      Fix this by resetting the dma_table pointer. In addition to that make
      sure that we free the iommu_bitmap when appropriate.
      Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
      Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      dba59909
  8. 20 4月, 2016 6 次提交
  9. 19 4月, 2016 1 次提交
  10. 18 4月, 2016 3 次提交