1. 31 8月, 2018 1 次提交
    • M
      perf annotate: Properly interpret indirect call · 1dc27f63
      Martin Liška 提交于
      The patch changes the parsing of:
      
      	callq  *0x8(%rbx)
      
      from:
      
        0.26 │     → callq  *8
      
      to:
      
        0.26 │     → callq  *0x8(%rbx)
      
      in this case an address is followed by a register, thus one can't parse
      only the address.
      
      Committer testing:
      
      1) run 'perf record sleep 10'
      2) before applying the patch, run:
      
           perf annotate --stdio2 > /tmp/before
      
      3) after applying the patch, run:
      
           perf annotate --stdio2 > /tmp/after
      
      4) diff /tmp/before /tmp/after:
        --- /tmp/before 2018-08-28 11:16:03.238384143 -0300
        +++ /tmp/after  2018-08-28 11:15:39.335341042 -0300
        @@ -13274,7 +13274,7 @@
                      ↓ jle    128
                        hash_value = hash_table->hash_func (key);
                        mov    0x8(%rsp),%rdi
        -  0.91       → callq  *30
        +  0.91       → callq  *0x30(%r12)
                        mov    $0x2,%r8d
                        cmp    $0x2,%eax
                        node_hash = hash_table->hashes[node_index];
        @@ -13848,7 +13848,7 @@
                         mov    %r14,%rdi
                         sub    %rbx,%r13
                         mov    %r13,%rdx
        -              → callq  *38
        +              → callq  *0x38(%r15)
                         cmp    %rax,%r13
           1.91        ↓ je     240
                  1b4:   mov    $0xffffffff,%r13d
        @@ -14026,7 +14026,7 @@
                         mov    %rcx,-0x500(%rbp)
                         mov    %r15,%rsi
                         mov    %r14,%rdi
        -              → callq  *38
        +              → callq  *0x38(%rax)
                         mov    -0x500(%rbp),%rcx
                         cmp    %rax,%rcx
                       ↓ jne    9b0
      <SNIP tons of other such cases>
      Signed-off-by: NMartin Liška <mliska@suse.cz>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NKim Phillips <kim.phillips@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/bd1f3932-be2b-85f9-7582-111ee0a43b07@suse.czSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1dc27f63
  2. 23 8月, 2018 1 次提交
    • I
      Merge tag 'perf-core-for-mingo-4.19-20180820' of... · 66e5db4a
      Ingo Molnar 提交于
      Merge tag 'perf-core-for-mingo-4.19-20180820' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      LLVM/clang/eBPF: (Arnaldo Carvalho de Melo)
      
       - Allow passing options to llc in addition to to clang.
      
      Hardware tracing: (Jack Henschel)
      
       - Improve error message for PMU address filters, clarifying availability of
         that feature in hardware having hardware tracing such as Intel PT.
      
      Python interface: (Jiri Olsa)
      
       - Fix read_on_cpu() interface.
      
      ELF/DWARF libraries: (Jiri Olsa)
      
       - Fix handling of the combo compressed module file + decompressed associated
         debuginfo file.
      
      Build (Rasmus Villemoes)
      
       - Disable parallelism for 'make clean', avoiding multiple submakes deleting
         the same files and causing the build to fail on systems such as Yocto.
      
      Kernel ABI copies: (Arnaldo Carvalho de Melo)
      
       - Update tools's copy of x86's cpufeatures.h.
      
       - Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy'.
      
      Miscellaneous: (Steven Rostedt)
      
       - Change libtraceevent to SPDX License format.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      66e5db4a
  3. 20 8月, 2018 21 次提交
  4. 18 8月, 2018 1 次提交
    • I
      Merge tag 'perf-core-for-mingo-4.19-20180815' of... · 5804b110
      Ingo Molnar 提交于
      Merge tag 'perf-core-for-mingo-4.19-20180815' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      kernel:
      
      - kallsyms, x86: Export addresses of PTI entry trampolines (Alexander Shishkin)
      
      - kallsyms: Simplify update_iter_mod() (Adrian Hunter)
      
      - x86: Add entry trampolines to kcore (Adrian Hunter)
      
      Hardware tracing:
      
      - Fix auxtrace queue resize (Adrian Hunter)
      
      Arch specific:
      
      - Fix uninitialized ARM SPE record error variable (Kim Phillips)
      
      - Fix trace event post-processing in powerpc (Sandipan Das)
      
      Build:
      
      - Fix check-headers.sh AND list path of execution (Alexander Kapshuk)
      
      - Remove -mcet and -fcf-protection when building the python binding
        with older clang versions (Arnaldo Carvalho de Melo)
      
      - Make check-headers.sh check based on kernel dir (Jiri Olsa)
      
      - Move syscall_64.tbl check into check-headers.sh (Jiri Olsa)
      
      Infrastructure:
      
      - Check for null when copying nsinfo.  (Benno Evers)
      
      Libraries:
      
      - Rename libtraceevent prefixes, prep work for making it a shared
        library generaly available (Tzvetomir Stoyanov (VMware))
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5804b110
  5. 15 8月, 2018 8 次提交
  6. 14 8月, 2018 8 次提交
    • L
      Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 13e091b6
      Linus Torvalds 提交于
      Pull x86 timer updates from Thomas Gleixner:
       "Early TSC based time stamping to allow better boot time analysis.
      
        This comes with a general cleanup of the TSC calibration code which
        grew warts and duct taping over the years and removes 250 lines of
        code. Initiated and mostly implemented by Pavel with help from various
        folks"
      
      * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
        x86/kvmclock: Mark kvm_get_preset_lpj() as __init
        x86/tsc: Consolidate init code
        sched/clock: Disable interrupts when calling generic_sched_clock_init()
        timekeeping: Prevent false warning when persistent clock is not available
        sched/clock: Close a hole in sched_clock_init()
        x86/tsc: Make use of tsc_calibrate_cpu_early()
        x86/tsc: Split native_calibrate_cpu() into early and late parts
        sched/clock: Use static key for sched_clock_running
        sched/clock: Enable sched clock early
        sched/clock: Move sched clock initialization and merge with generic clock
        x86/tsc: Use TSC as sched clock early
        x86/tsc: Initialize cyc2ns when tsc frequency is determined
        x86/tsc: Calibrate tsc only once
        ARM/time: Remove read_boot_clock64()
        s390/time: Remove read_boot_clock64()
        timekeeping: Default boot time offset to local_clock()
        timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset()
        s390/time: Add read_persistent_wall_and_boot_offset()
        x86/xen/time: Output xen sched_clock time from 0
        x86/xen/time: Initialize pv xen time in init_hypervisor_platform()
        ...
      13e091b6
    • L
      Merge branch 'x86/pti' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · eac34119
      Linus Torvalds 提交于
      Pull x86 PTI updates from Thomas Gleixner:
       "The Speck brigade sadly provides yet another large set of patches
        destroying the perfomance which we carefully built and preserved
      
         - PTI support for 32bit PAE. The missing counter part to the 64bit
           PTI code implemented by Joerg.
      
         - A set of fixes for the Global Bit mechanics for non PCID CPUs which
           were setting the Global Bit too widely and therefore possibly
           exposing interesting memory needlessly.
      
         - Protection against userspace-userspace SpectreRSB
      
         - Support for the upcoming Enhanced IBRS mode, which is preferred
           over IBRS. Unfortunately we dont know the performance impact of
           this, but it's expected to be less horrible than the IBRS
           hammering.
      
         - Cleanups and simplifications"
      
      * 'x86/pti' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
        x86/mm/pti: Move user W+X check into pti_finalize()
        x86/relocs: Add __end_rodata_aligned to S_REL
        x86/mm/pti: Clone kernel-image on PTE level for 32 bit
        x86/mm/pti: Don't clear permissions in pti_clone_pmd()
        x86/mm/pti: Fix 32 bit PCID check
        x86/mm/init: Remove freed kernel image areas from alias mapping
        x86/mm/init: Add helper for freeing kernel image pages
        x86/mm/init: Pass unconverted symbol addresses to free_init_pages()
        mm: Allow non-direct-map arguments to free_reserved_area()
        x86/mm/pti: Clear Global bit more aggressively
        x86/speculation: Support Enhanced IBRS on future CPUs
        x86/speculation: Protect against userspace-userspace spectreRSB
        x86/kexec: Allocate 8k PGDs for PTI
        Revert "perf/core: Make sure the ring-buffer is mapped in all page-tables"
        x86/mm: Remove in_nmi() warning from vmalloc_fault()
        x86/entry/32: Check for VM86 mode in slow-path check
        perf/core: Make sure the ring-buffer is mapped in all page-tables
        x86/pti: Check the return value of pti_user_pagetable_walk_pmd()
        x86/pti: Check the return value of pti_user_pagetable_walk_p4d()
        x86/entry/32: Add debug code to check entry/exit CR3
        ...
      eac34119
    • L
      Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d191c82d
      Linus Torvalds 提交于
      Pull x86 vdso update from Thomas Gleixner:
       "Use LD to link the VDSO libs instead of indirecting trough CC which
        causes build failures with Clang"
      
      * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: vdso: Use $LD instead of $CC to link
      d191c82d
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4d5ac4b8
      Linus Torvalds 提交于
      Pull misc x86 fixes from Thomas Gleixner:
       "Two fixes for x86:
      
         - Provide a declaration for native_save_fl() which unbreaks the
           wreckage caused by making it 'extern inline'.
      
         - Fix the failing paravirt patching which is supposed to replace
           indirect with direct calls. The wreckage is caused by an incorrect
           clobber test"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/paravirt: Fix spectre-v2 mitigations for paravirt guests
        x86/irqflags: Provide a declaration for native_save_fl
      4d5ac4b8
    • L
      Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 203b4fc9
      Linus Torvalds 提交于
      Pull x86 mm updates from Thomas Gleixner:
      
       - Make lazy TLB mode even lazier to avoid pointless switch_mm()
         operations, which reduces CPU load by 1-2% for memcache workloads
      
       - Small cleanups and improvements all over the place
      
      * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mm: Remove redundant check for kmem_cache_create()
        arm/asm/tlb.h: Fix build error implicit func declaration
        x86/mm/tlb: Make clear_asid_other() static
        x86/mm/tlb: Skip atomic operations for 'init_mm' in switch_mm_irqs_off()
        x86/mm/tlb: Always use lazy TLB mode
        x86/mm/tlb: Only send page table free TLB flush to lazy TLB CPUs
        x86/mm/tlb: Make lazy TLB mode lazier
        x86/mm/tlb: Restructure switch_mm_irqs_off()
        x86/mm/tlb: Leave lazy TLB mode at page table free time
        mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids
        x86/mm: Add TLB purge to free pmd/pte page interfaces
        ioremap: Update pgtable free interfaces with addr
        x86/mm: Disable ioremap free page handling on x86-PAE
      203b4fc9
    • L
      Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7edcf0d3
      Linus Torvalds 提交于
      Pull x86 platform updates from Thomas Gleixner:
       "Trivial cleanups and improvements"
      
      * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/platform/UV: Remove redundant check of p == q
        x86/platform/olpc: Use PTR_ERR_OR_ZERO()
        x86/platform/UV: Mark memblock related init code and data correctly
      7edcf0d3
    • L
      Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 30de24c7
      Linus Torvalds 提交于
      Pull x86 cache QoS (RDT/CAR) updates from Thomas Gleixner:
       "Add support for pseudo-locked cache regions.
      
        Cache Allocation Technology (CAT) allows on certain CPUs to isolate a
        region of cache and 'lock' it. Cache pseudo-locking builds on the fact
        that a CPU can still read and write data pre-allocated outside its
        current allocated area on cache hit. With cache pseudo-locking data
        can be preloaded into a reserved portion of cache that no application
        can fill, and from that point on will only serve cache hits. The cache
        pseudo-locked memory is made accessible to user space where an
        application can map it into its virtual address space and thus have a
        region of memory with reduced average read latency.
      
        The locking is not perfect and gets totally screwed by WBINDV and
        similar mechanisms, but it provides a reasonable enhancement for
        certain types of latency sensitive applications.
      
        The implementation extends the current CAT mechanism and provides a
        generally useful exclusive CAT mode on which it builds the extra
        pseude-locked regions"
      
      * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
        x86/intel_rdt: Disable PMU access
        x86/intel_rdt: Fix possible circular lock dependency
        x86/intel_rdt: Make CPU information accessible for pseudo-locked regions
        x86/intel_rdt: Support restoration of subset of permissions
        x86/intel_rdt: Fix cleanup of plr structure on error
        x86/intel_rdt: Move pseudo_lock_region_clear()
        x86/intel_rdt: Limit C-states dynamically when pseudo-locking active
        x86/intel_rdt: Support L3 cache performance event of Broadwell
        x86/intel_rdt: More precise L2 hit/miss measurements
        x86/intel_rdt: Create character device exposing pseudo-locked region
        x86/intel_rdt: Create debugfs files for pseudo-locking testing
        x86/intel_rdt: Create resctrl debug area
        x86/intel_rdt: Ensure RDT cleanup on exit
        x86/intel_rdt: Resctrl files reflect pseudo-locked information
        x86/intel_rdt: Support creation/removal of pseudo-locked region
        x86/intel_rdt: Pseudo-lock region creation/removal core
        x86/intel_rdt: Discover supported platforms via prefetch disable bits
        x86/intel_rdt: Add utilities to test pseudo-locked region possibility
        x86/intel_rdt: Split resource group removal in two
        x86/intel_rdt: Enable entering of pseudo-locksetup mode
        ...
      30de24c7
    • L
      Merge branch 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f4990264
      Linus Torvalds 提交于
      Pull x86/hyper-v update from Thomas Gleixner:
       "Add fast hypercall support for guest running on the Microsoft HyperV(isor)"
      
      * 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/hyper-v: Fix wrong merge conflict resolution
        x86/hyper-v: Check for VP_INVAL in hyperv_flush_tlb_others()
        x86/hyper-v: Check cpumask_to_vpset() return value in hyperv_flush_tlb_others_ex()
        x86/hyper-v: Trace PV IPI send
        x86/hyper-v: Use cheaper HVCALL_SEND_IPI hypercall when possible
        x86/hyper-v: Use 'fast' hypercall for HVCALL_SEND_IPI
        x86/hyper-v: Implement hv_do_fast_hypercall16
        x86/hyper-v: Use cheaper HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible
      f4990264