1. 15 2月, 2018 1 次提交
    • I
      tools/headers: Synchronize kernel ABI headers, v4.16-rc1 · f091f1d6
      Ingo Molnar 提交于
      Sync the following tooling headers with the latest kernel version:
      
        tools/arch/powerpc/include/uapi/asm/kvm.h
        tools/arch/x86/include/asm/cpufeatures.h
        tools/include/uapi/drm/i915_drm.h
        tools/include/uapi/linux/if_link.h
        tools/include/uapi/linux/kvm.h
      
      All the changes are new ABI additions which don't impact their use
      in existing tooling.
      
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f091f1d6
  2. 09 2月, 2018 1 次提交
  3. 07 2月, 2018 1 次提交
    • C
      lib: optimize cpumask_next_and() · 0ade34c3
      Clement Courbet 提交于
      We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
      It's essentially a joined iteration in search for a non-zero bit, which is
      currently implemented as a lookup join (find a nonzero bit on the lhs,
      lookup the rhs to see if it's set there).
      
      Implement a direct join (find a nonzero bit on the incrementally built
      join).  Also add generic bitmap benchmarks in the new `test_find_bit`
      module for new function (see `find_next_and_bit` in [2] and [3] below).
      
      For cpumask_next_and, direct benchmarking shows that it's 1.17x to 14x
      faster with a geometric mean of 2.1 on 32 CPUs [1].  No impact on memory
      usage.  Note that on Arm, the new pure-C implementation still outperforms
      the old one that uses a mix of C and asm (`find_next_bit`) [3].
      
      [1] Approximate benchmark code:
      
      ```
        unsigned long src1p[nr_cpumask_longs] = {pattern1};
        unsigned long src2p[nr_cpumask_longs] = {pattern2};
        for (/*a bunch of repetitions*/) {
          for (int n = -1; n <= nr_cpu_ids; ++n) {
            asm volatile("" : "+rm"(src1p)); // prevent any optimization
            asm volatile("" : "+rm"(src2p));
            unsigned long result = cpumask_next_and(n, src1p, src2p);
            asm volatile("" : "+rm"(result));
          }
        }
      ```
      
      Results:
      pattern1    pattern2     time_before/time_after
      0x0000ffff  0x0000ffff   1.65
      0x0000ffff  0x00005555   2.24
      0x0000ffff  0x00001111   2.94
      0x0000ffff  0x00000000   14.0
      0x00005555  0x0000ffff   1.67
      0x00005555  0x00005555   1.71
      0x00005555  0x00001111   1.90
      0x00005555  0x00000000   6.58
      0x00001111  0x0000ffff   1.46
      0x00001111  0x00005555   1.49
      0x00001111  0x00001111   1.45
      0x00001111  0x00000000   3.10
      0x00000000  0x0000ffff   1.18
      0x00000000  0x00005555   1.18
      0x00000000  0x00001111   1.17
      0x00000000  0x00000000   1.25
      -----------------------------
                     geo.mean  2.06
      
      [2] test_find_next_bit, X86 (skylake)
      
       [ 3913.477422] Start testing find_bit() with random-filled bitmap
       [ 3913.477847] find_next_bit: 160868 cycles, 16484 iterations
       [ 3913.477933] find_next_zero_bit: 169542 cycles, 16285 iterations
       [ 3913.478036] find_last_bit: 201638 cycles, 16483 iterations
       [ 3913.480214] find_first_bit: 4353244 cycles, 16484 iterations
       [ 3913.480216] Start testing find_next_and_bit() with random-filled
       bitmap
       [ 3913.481074] find_next_and_bit: 89604 cycles, 8216 iterations
       [ 3913.481075] Start testing find_bit() with sparse bitmap
       [ 3913.481078] find_next_bit: 2536 cycles, 66 iterations
       [ 3913.481252] find_next_zero_bit: 344404 cycles, 32703 iterations
       [ 3913.481255] find_last_bit: 2006 cycles, 66 iterations
       [ 3913.481265] find_first_bit: 17488 cycles, 66 iterations
       [ 3913.481266] Start testing find_next_and_bit() with sparse bitmap
       [ 3913.481272] find_next_and_bit: 764 cycles, 1 iterations
      
      [3] test_find_next_bit, arm (v7 odroid XU3).
      
      [  267.206928] Start testing find_bit() with random-filled bitmap
      [  267.214752] find_next_bit: 4474 cycles, 16419 iterations
      [  267.221850] find_next_zero_bit: 5976 cycles, 16350 iterations
      [  267.229294] find_last_bit: 4209 cycles, 16419 iterations
      [  267.279131] find_first_bit: 1032991 cycles, 16420 iterations
      [  267.286265] Start testing find_next_and_bit() with random-filled
      bitmap
      [  267.302386] find_next_and_bit: 2290 cycles, 8140 iterations
      [  267.309422] Start testing find_bit() with sparse bitmap
      [  267.316054] find_next_bit: 191 cycles, 66 iterations
      [  267.322726] find_next_zero_bit: 8758 cycles, 32703 iterations
      [  267.329803] find_last_bit: 84 cycles, 66 iterations
      [  267.336169] find_first_bit: 4118 cycles, 66 iterations
      [  267.342627] Start testing find_next_and_bit() with sparse bitmap
      [  267.356919] find_next_and_bit: 91 cycles, 1 iterations
      
      [courbet@google.com: v6]
        Link: http://lkml.kernel.org/r/20171129095715.23430-1-courbet@google.com
      [geert@linux-m68k.org: m68k/bitops: always include <asm-generic/bitops/find.h>]
        Link: http://lkml.kernel.org/r/1512556816-28627-1-git-send-email-geert@linux-m68k.org
      Link: http://lkml.kernel.org/r/20171128131334.23491-1-courbet@google.comSigned-off-by: NClement Courbet <courbet@google.com>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Yury Norov <ynorov@caviumnetworks.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ade34c3
  4. 03 2月, 2018 4 次提交
    • E
      tools: add netlink.h and if_link.h in tools uapi · dc2b9f19
      Eric Leblond 提交于
      The headers are necessary for libbpf compilation on system with older
      version of the headers.
      Signed-off-by: NEric Leblond <eric@regit.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      dc2b9f19
    • A
      tools headers: Synchronize uapi/linux/sched.h · 7a16c7e1
      Arnaldo Carvalho de Melo 提交于
      To get the tools copy updated with the changes in 34be3930
      ("sched/deadline: Implement "runtime overrun signal" support"), that
      cause no effect on the tools, will be used when we start copying the
      sched_attr struct argument to the sched_get/setattr syscalls.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-8rododhs87x8hv9k83qcdtne@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7a16c7e1
    • A
      tools headers: Sync {tools/,}arch/powerpc/include/uapi/asm/kvm.h · 1b8f5160
      Arnaldo Carvalho de Melo 提交于
      The changes in the 3214d01f ("KVM: PPC: Book3S: Provide information
      about hardware/firmware CVE workarounds") commit right now will not
      produce any change in the tools, but that is because we still need to
      improve tools/perf/trace/beauty/kvm_ioctl.sh to build per arch string
      tables, so that we avoid assigning multiple times to the same command
      string entry, i.e. multiple defines, for different arches, have the same
      value, causing this:
      
        In file included from trace/beauty/ioctl.c:82:0:
        /tmp/build/perf/trace/beauty/generated/ioctl/kvm_ioctl_array.c: In function ‘ioctl__scnprintf_kvm_cmd’:
        /tmp/build/perf/trace/beauty/generated/ioctl/kvm_ioctl_array.c:76:11: error: initialized field overwritten [-Werror=override-init]
        /tmp/build/perf/trace/beauty/generated/ioctl/kvm_ioctl_array.c:88:11: note: (near initialization for ‘kvm_ioctl_cmds[165]’)
        /tmp/build/perf/trace/beauty/generated/ioctl/kvm_ioctl_array.c:90:11: error: initialized field overwritten [-Werror=override-init]
          [0xa6] = "PPC_GET_SMMU_INFO",
                   ^~~~~~~~~~~~~~~~~~~
      
      So the onlye effect of updating the tools/ copy of ppc's kvm.h header
      is to silence these perf build warnings:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
        Warning: Kernel ABI header at 'tools/arch/powerpc/include/uapi/asm/kvm.h' differs from latest version at 'arch/powerpc/include/uapi/asm/kvm.h'
      
      At some point we should do what we did for the errno tables and create
      per-arch string translation tables for the KVM ioctl commands for the
      architectures supporting KVM, such as s/390, PowerPC, x86_64 and ARM.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-jmcf78tqiudgn46zqfw2tgt2@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1b8f5160
    • A
      tools headers: Synchronize sound/asound.h · 6bc7626c
      Arnaldo Carvalho de Melo 提交于
      To pick up the changes from this cset:
      
        823dbb6e ("ALSA: pcm: add SNDRV_PCM_FORMAT_{S,U}20")
      
      It doesn't affect how the tools are built, this os done just to silence
      this perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/sound/asound.h' differs from latest version at 'include/uapi/sound/asound.h'
      
      Right now tools/perf uses this header to generate string tables to
      translate ioctl commands in 'perf trace', see
      tools/perf/trace/beauty/sndrv_pcm_ioctl.sh, here is an example
      of a strace like system wide session, for one second:
      
        # perf trace -a -e ioctl sleep 1
           0.000 ( 0.019 ms): alsa-sink-HDMI/4219 ioctl(fd: 47</dev/snd/pcmC0D8p>, cmd: SNDRV_PCM_HWSYNC, arg: 0x107b) = 0
           0.081 ( 0.006 ms): alsa-sink-HDMI/4219 ioctl(fd: 47</dev/snd/pcmC0D8p>, cmd: SNDRV_PCM_HWSYNC, arg: 0x107b) = 0
           0.092 ( 0.006 ms): alsa-sink-HDMI/4219 ioctl(fd: 47</dev/snd/pcmC0D8p>, cmd: SNDRV_PCM_STATUS_EXT, arg: 0x7f6745ec4ae0) = 0
          10.178 ( 0.013 ms): alsa-sink-HDMI/4219 ioctl(fd: 47</dev/snd/pcmC0D8p>, cmd: SNDRV_PCM_HWSYNC, arg: 0x107b) = 0
          10.229 ( 0.005 ms): alsa-sink-HDMI/4219 ioctl(fd: 47</dev/snd/pcmC0D8p>, cmd: SNDRV_PCM_HWSYNC, arg: 0x107b) = 0
          10.238 ( 0.013 ms): alsa-sink-HDMI/4219 ioctl(fd: 47</dev/snd/pcmC0D8p>, cmd: SNDRV_PCM_STATUS_EXT, arg: 0x7f6745ec4ae0) = 0
          10.368 ( 0.009 ms): threaded-ml/26440 ioctl(fd: 141<socket:[111353]>, cmd: TIOCLINUX, arg: 0x7f8f70d2e1a4) = 0
          10.495 ( 0.018 ms): alsa-sink-HDMI/4219 ioctl(fd: 47</dev/snd/pcmC0D8p>, cmd: SNDRV_PCM_HWSYNC, arg: 0x107b) = 0
          10.526 ( 0.005 ms): alsa-sink-HDMI/4219 ioctl(fd: 47</dev/snd/pcmC0D8p>, cmd: SNDRV_PCM_HWSYNC, arg: 0x107b) = 0
          19.695 ( 0.018 ms): alsa-sink-HDMI/4219 ioctl(fd: 47</dev/snd/pcmC0D8p>, cmd: SNDRV_PCM_HWSYNC, arg: 0x107b) = 0
          19.757 ( 0.006 ms): alsa-sink-HDMI/4219 ioctl(fd: 47</dev/snd/pcmC0D8p>, cmd: SNDRV_PCM_HWSYNC, arg: 0x107b) = 0
          19.767 ( 0.005 ms): alsa-sink-HDMI/4219 ioctl(fd: 47</dev/snd/pcmC0D8p>, cmd: SNDRV_PCM_STATUS_EXT, arg: 0x7f6745ec4ae0) = 0
      <BIG SNIP>
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-sfpeesn8w0pyn3fe7vf2xmfl@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6bc7626c
  5. 26 1月, 2018 1 次提交
  6. 23 1月, 2018 1 次提交
  7. 19 1月, 2018 2 次提交
  8. 15 1月, 2018 1 次提交
    • J
      bpf: offload: add map offload infrastructure · a3884572
      Jakub Kicinski 提交于
      BPF map offload follow similar path to program offload.  At creation
      time users may specify ifindex of the device on which they want to
      create the map.  Map will be validated by the kernel's
      .map_alloc_check callback and device driver will be called for the
      actual allocation.  Map will have an empty set of operations
      associated with it (save for alloc and free callbacks).  The real
      device callbacks are kept in map->offload->dev_ops because they
      have slightly different signatures.  Map operations are called in
      process context so the driver may communicate with HW freely,
      msleep(), wait() etc.
      
      Map alloc and free callbacks are muxed via existing .ndo_bpf, and
      are always called with rtnl lock held.  Maps and programs are
      guaranteed to be destroyed before .ndo_uninit (i.e. before
      unregister_netdev() returns).  Map callbacks are invoked with
      bpf_devs_lock *read* locked, drivers must take care of exclusive
      locking if necessary.
      
      All offload-specific branches are marked with unlikely() (through
      bpf_map_is_dev_bound()), given that branch penalty will be
      negligible compared to IO anyway, and we don't want to penalize
      SW path unnecessarily.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      a3884572
  9. 08 1月, 2018 2 次提交
  10. 31 12月, 2017 1 次提交
  11. 18 12月, 2017 1 次提交
    • A
      libbpf: add support for bpf_call · 48cca7e4
      Alexei Starovoitov 提交于
      - recognize relocation emitted by llvm
      - since all regular function will be kept in .text section and llvm
        takes care of pc-relative offsets in bpf_call instruction
        simply copy all of .text to relevant program section while adjusting
        bpf_call instructions in program section to point to newly copied
        body of instructions from .text
      - do so for all programs in the elf file
      - set all programs types to the one passed to bpf_prog_load()
      
      Note for elf files with multiple programs that use different
      functions in .text section we need to do 'linker' style logic.
      This work is still TBD
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      48cca7e4
  12. 15 12月, 2017 1 次提交
    • I
      tools/headers: Synchronize kernel <-> tooling headers · 643e345c
      Ingo Molnar 提交于
      Two kernel headers got modified recently, which are used by tooling as well:
      
       tools/include/uapi/linux/kvm.h
       arch/x86/include/asm/cpufeatures.h
      
      None of those changes have an effect on tooling, so do a plain copy.
      
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      643e345c
  13. 13 12月, 2017 4 次提交
  14. 12 12月, 2017 1 次提交
  15. 09 12月, 2017 1 次提交
  16. 05 12月, 2017 1 次提交
  17. 29 11月, 2017 7 次提交
  18. 21 11月, 2017 2 次提交
  19. 17 11月, 2017 1 次提交
  20. 16 11月, 2017 1 次提交
  21. 11 11月, 2017 3 次提交
  22. 05 11月, 2017 2 次提交