1. 02 8月, 2017 1 次提交
  2. 27 6月, 2017 1 次提交
    • L
      x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF · f8475cef
      Len Brown 提交于
      The goal of this change is to give users a uniform and meaningful
      result when they read /sys/...cpufreq/scaling_cur_freq
      on modern x86 hardware, as compared to what they get today.
      
      Modern x86 processors include the hardware needed
      to accurately calculate frequency over an interval --
      APERF, MPERF, and the TSC.
      
      Here we provide an x86 routine to make this calculation
      on supported hardware, and use it in preference to any
      driver driver-specific cpufreq_driver.get() routine.
      
      MHz is computed like so:
      
      MHz = base_MHz * delta_APERF / delta_MPERF
      
      MHz is the average frequency of the busy processor
      over a measurement interval.  The interval is
      defined to be the time between successive invocations
      of aperfmperf_khz_on_cpu(), which are expected to to
      happen on-demand when users read sysfs attribute
      cpufreq/scaling_cur_freq.
      
      As with previous methods of calculating MHz,
      idle time is excluded.
      
      base_MHz above is from TSC calibration global "cpu_khz".
      
      This x86 native method to calculate MHz returns a meaningful result
      no matter if P-states are controlled by hardware or firmware
      and/or if the Linux cpufreq sub-system is or is-not installed.
      
      When this routine is invoked more frequently, the measurement
      interval becomes shorter.  However, the code limits re-computation
      to 10ms intervals so that average frequency remains meaningful.
      
      Discerning users are encouraged to take advantage of
      the turbostat(8) utility, which can gracefully handle
      concurrent measurement intervals of arbitrary length.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f8475cef
  3. 31 10月, 2016 2 次提交
    • T
      x86/intel_rdt: Add schemata file · 60ec2440
      Tony Luck 提交于
      Last of the per resource group files. Also mode 0644. This one shows
      the resources available to the group. Syntax depends on whether the
      "cdp" mount option was given. With code/data prioritization disabled
      it is simply a list of masks for each cache domain. Initial value
      allows access to all of the L3 cache on all domains. E.g. on a 2 socket
      Broadwell:
              L3:0=fffff;1=fffff
      With CDP enabled, separate masks for data and instructions are provided:
              L3DATA:0=fffff;1=fffff
              L3CODE:0=fffff;1=fffff
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Cc: "Shaohua Li" <shli@fb.com>
      Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Stephane Eranian" <eranian@google.com>
      Cc: "Dave Hansen" <dave.hansen@intel.com>
      Cc: "David Carrillo-Cisneros" <davidcc@google.com>
      Cc: "Nilay Vaish" <nilayvaish@gmail.com>
      Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
      Cc: "Ingo Molnar" <mingo@elte.hu>
      Cc: "Borislav Petkov" <bp@suse.de>
      Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
      Link: http://lkml.kernel.org/r/1477692289-37412-9-git-send-email-fenghua.yu@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      60ec2440
    • F
      x86/intel_rdt: Add basic resctrl filesystem support · 5ff193fb
      Fenghua Yu 提交于
      Use kernfs as basis for our user interface filesystem. This patch
      supports mount/umount, and one mount parameter "cdp" to enable code/data
      prioritization (though all we do at this point is ensure that the system
      can support CDP).  The file system is not populated yet in this patch.
      
      [ tglx: Fixed up a few nits and added cdp handling in case of error ]
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "Shaohua Li" <shli@fb.com>
      Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Stephane Eranian" <eranian@google.com>
      Cc: "Dave Hansen" <dave.hansen@intel.com>
      Cc: "David Carrillo-Cisneros" <davidcc@google.com>
      Cc: "Nilay Vaish" <nilayvaish@gmail.com>
      Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
      Cc: "Ingo Molnar" <mingo@elte.hu>
      Cc: "Borislav Petkov" <bp@suse.de>
      Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
      Link: http://lkml.kernel.org/r/1477692289-37412-4-git-send-email-fenghua.yu@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5ff193fb
  4. 27 10月, 2016 1 次提交
    • F
      x86/intel_rdt: Add CONFIG, Makefile, and basic initialization · 78e99b4a
      Fenghua Yu 提交于
      Introduce CONFIG_INTEL_RDT_A (default: no, dependent on CPU_SUP_INTEL) to
      control inclusion of Resource Director Technology in the build.
      
      Simple init() routine just checks which features are present. If they are
      pr_info() one line summary for each feature for now.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "David Carrillo-Cisneros" <davidcc@google.com>
      Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Stephane Eranian" <eranian@google.com>
      Cc: "Dave Hansen" <dave.hansen@intel.com>
      Cc: "Shaohua Li" <shli@fb.com>
      Cc: "Nilay Vaish" <nilayvaish@gmail.com>
      Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
      Cc: "Ingo Molnar" <mingo@elte.hu>
      Cc: "Borislav Petkov" <bp@suse.de>
      Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
      Link: http://lkml.kernel.org/r/1477142405-32078-7-git-send-email-fenghua.yu@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      78e99b4a
  5. 25 10月, 2016 1 次提交
  6. 23 3月, 2016 1 次提交
    • D
      kernel: add kcov code coverage · 5c9a8750
      Dmitry Vyukov 提交于
      kcov provides code coverage collection for coverage-guided fuzzing
      (randomized testing).  Coverage-guided fuzzing is a testing technique
      that uses coverage feedback to determine new interesting inputs to a
      system.  A notable user-space example is AFL
      (http://lcamtuf.coredump.cx/afl/).  However, this technique is not
      widely used for kernel testing due to missing compiler and kernel
      support.
      
      kcov does not aim to collect as much coverage as possible.  It aims to
      collect more or less stable coverage that is function of syscall inputs.
      To achieve this goal it does not collect coverage in soft/hard
      interrupts and instrumentation of some inherently non-deterministic or
      non-interesting parts of kernel is disbled (e.g.  scheduler, locking).
      
      Currently there is a single coverage collection mode (tracing), but the
      API anticipates additional collection modes.  Initially I also
      implemented a second mode which exposes coverage in a fixed-size hash
      table of counters (what Quentin used in his original patch).  I've
      dropped the second mode for simplicity.
      
      This patch adds the necessary support on kernel side.  The complimentary
      compiler support was added in gcc revision 231296.
      
      We've used this support to build syzkaller system call fuzzer, which has
      found 90 kernel bugs in just 2 months:
      
        https://github.com/google/syzkaller/wiki/Found-Bugs
      
      We've also found 30+ bugs in our internal systems with syzkaller.
      Another (yet unexplored) direction where kcov coverage would greatly
      help is more traditional "blob mutation".  For example, mounting a
      random blob as a filesystem, or receiving a random blob over wire.
      
      Why not gcov.  Typical fuzzing loop looks as follows: (1) reset
      coverage, (2) execute a bit of code, (3) collect coverage, repeat.  A
      typical coverage can be just a dozen of basic blocks (e.g.  an invalid
      input).  In such context gcov becomes prohibitively expensive as
      reset/collect coverage steps depend on total number of basic
      blocks/edges in program (in case of kernel it is about 2M).  Cost of
      kcov depends only on number of executed basic blocks/edges.  On top of
      that, kernel requires per-thread coverage because there are always
      background threads and unrelated processes that also produce coverage.
      With inlined gcov instrumentation per-thread coverage is not possible.
      
      kcov exposes kernel PCs and control flow to user-space which is
      insecure.  But debugfs should not be mapped as user accessible.
      
      Based on a patch by Quentin Casasnovas.
      
      [akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
      [akpm@linux-foundation.org: unbreak allmodconfig]
      [akpm@linux-foundation.org: follow x86 Makefile layout standards]
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: syzkaller <syzkaller@googlegroups.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Tavis Ormandy <taviso@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: David Drysdale <drysdale@google.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c9a8750
  7. 17 2月, 2016 16 次提交
  8. 09 2月, 2016 5 次提交
  9. 30 1月, 2016 1 次提交
  10. 06 10月, 2015 1 次提交
    • K
      perf/x86: Add Intel cstate PMUs support · 7ce1346a
      Kan Liang 提交于
      This patch adds new PMUs to support cstate related free running
      (read-only) counters. These counters may be used simultaneously by other
      tools, such as turbostat. However, it still make sense to implement them
      in perf. Because we can conveniently collect them together with other
      events, and allow to use them from tools without special MSR access
      code.
      
      These counters include CORE_C*_RESIDENCY and PKG_C*_RESIDENCY.
      According to counters' scope and category, two PMUs are registered with
      the perf_event core subsystem.
      
       - 'cstate_core': The counter is available for each physical core. The
                        counters include CORE_C*_RESIDENCY.
      
       - 'cstate_pkg':  The counter is available for each physical package. The
                        counters include PKG_C*_RESIDENCY.
      
      The events are exposed in sysfs for use by perf stat and other tools.
      The files are:
      
        /sys/devices/cstate_core/events/c*-residency
        /sys/devices/cstate_pkg/events/c*-residency
      
      These events only support system-wide mode counting.
      The /sys/devices/cstate_*/cpumask file can be used by tools to figure
      out which CPUs to monitor by default.
      
      The PMU type (attr->type) is dynamically allocated and is available from
      /sys/devices/core_misc/type and /sys/device/cstate_*/type.
      
      Sampling is not supported.
      
      Here is an example.
      
       - To caculate the fraction of time when the core is running in C6 state
         CORE_C6_time% = CORE_C6_RESIDENCY / TSC
      
       # perf stat -x, -e"cstate_core/c6-residency/,msr/tsc/" -C0 -- taskset -c 0 sleep 5
      
         11838820015,,cstate_core/c6-residency/,5175919658,100.00
         11877130740,,msr/tsc/,5175922010,100.00
      
       For sleep, 99.7% of time we ran in C6 state.
      
       # perf stat -x, -e"cstate_core/c6-residency/,msr/tsc/" -C0 -- taskset -c 0 busyloop
      
         1253316,,cstate_core/c6-residency/,4360969154,100.00
         10012635248,,msr/tsc/,4360972366,100.00
      
       For busyloop, 0.01% of time we ran in C6 state.
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@kernel.org
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1443443404-8581-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7ce1346a
  11. 04 8月, 2015 1 次提交
  12. 02 4月, 2015 2 次提交
    • A
      perf/x86/intel/bts: Add BTS PMU driver · 8062382c
      Alexander Shishkin 提交于
      Add support for Branch Trace Store (BTS) via kernel perf event infrastructure.
      The difference with the existing implementation of BTS support is that this
      one is a separate PMU that exports events' trace buffers to userspace by means
      of AUX area of the perf buffer, which is zero-copy mapped into userspace.
      
      The immediate benefit is that the buffer size can be much bigger, resulting in
      fewer interrupts and no kernel side copying is involved and little to no trace
      data loss. Also, kernel code can be traced with this driver.
      
      The old way of collecting BTS traces still works.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1422614435-114702-1-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8062382c
    • A
      perf/x86/intel/pt: Add Intel PT PMU driver · 52ca9ced
      Alexander Shishkin 提交于
      Add support for Intel Processor Trace (PT) to kernel's perf events.
      PT is an extension of Intel Architecture that collects information about
      software execuction such as control flow, execution modes and timings and
      formats it into highly compressed binary packets. Even being compressed,
      these packets are generated at hundreds of megabytes per second per core,
      which makes it impractical to decode them on the fly in the kernel.
      
      This driver exports trace data by through AUX space in the perf ring
      buffer, which is zero-copy mapped into userspace for faster data retrieval.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1422614392-114498-1-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      52ca9ced
  13. 23 3月, 2015 1 次提交
  14. 23 12月, 2014 1 次提交
  15. 28 10月, 2014 1 次提交
  16. 18 8月, 2014 2 次提交
    • J
      x86: Support compiling out human-friendly processor feature names · 9def39be
      Josh Triplett 提交于
      The table mapping CPUID bits to human-readable strings takes up a
      non-trivial amount of space, and only exists to support /proc/cpuinfo
      and a couple of kernel messages.  Since programs depend on the format of
      /proc/cpuinfo, force inclusion of the table when building with /proc
      support; otherwise, support omitting that table to save space, in which
      case the kernel messages will print features numerically instead.
      
      In addition to saving 1408 bytes out of vmlinux, this also saves 1373
      bytes out of the uncompressed setup code, which contributes directly to
      the size of bzImage.
      Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
      9def39be
    • J
      x86: Drop support for /proc files when !CONFIG_PROC_FS · 39f838e0
      Josh Triplett 提交于
      arch/x86/kernel/cpu/proc.c only exists to support files in /proc; omit that
      file when compiling without CONFIG_PROC_FS.
      
      Saves 645 additional bytes on 32-bit x86 when !CONFIG_PROC_FS:
      
      add/remove: 0/5 grow/shrink: 0/0 up/down: 0/-645 (-645)
      function                                     old     new   delta
      c_stop                                         1       -      -1
      c_next                                        11       -     -11
      cpuinfo_op                                    16       -     -16
      c_start                                       24       -     -24
      show_cpuinfo                                 593       -    -593
      Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
      39f838e0
  17. 13 8月, 2014 2 次提交