1. 25 6月, 2019 18 次提交
    • A
      perf intel-pt: Add CBR value to decoder state · 51b09186
      Adrian Hunter 提交于
      For convenience, add the core-to-bus ratio (CBR) value to the decoder
      state.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20190622093248.581-4-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      51b09186
    • A
      perf intel-pt: Cater for CBR change in PSB+ · 91de8684
      Adrian Hunter 提交于
      PSB+ provides status information only so the core-to-bus ratio (CBR) in
      PSB+ will not have changed from its previous value. However, cater for
      the possibility of a another CBR change that gets caught up in the PSB+
      anyway.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20190622093248.581-3-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      91de8684
    • A
      perf intel-pt: Decoder to output CBR changes immediately · abe5a1d3
      Adrian Hunter 提交于
      The core-to-bus ratio (CBR) provides the CPU frequency. With branches
      enabled, the decoder was outputting CBR changes only when there was a
      branch. That loses the correct time of the change if the trace is not in
      context (e.g. not tracing kernel space). Change to output the CBR change
      immediately.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20190622093248.581-2-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      abe5a1d3
    • K
      perf tools: Increase MAX_NR_CPUS and MAX_CACHES · 9f94c7f9
      Kyle Meyer 提交于
      Attempting to profile 1024 or more CPUs with perf causes two errors:
      
        perf record -a
        [ perf record: Woken up X times to write data ]
        way too many cpu caches..
        [ perf record: Captured and wrote X MB perf.data (X samples) ]
      
        perf report -C 1024
        Error: failed to set  cpu bitmap
        Requested CPU 1024 too large. Consider raising MAX_NR_CPUS
      
        Increasing MAX_NR_CPUS from 1024 to 2048 and redefining MAX_CACHES as
        MAX_NR_CPUS * 4 returns normal functionality to perf:
      
        perf record -a
        [ perf record: Woken up X times to write data ]
        [ perf record: Captured and wrote X MB perf.data (X samples) ]
      
        perf report -C 1024
        ...
      Signed-off-by: NKyle Meyer <kyle.meyer@hpe.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20190620193630.154025-1-meyerk@stormcage.eag.rdlabs.hpecorp.netSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9f94c7f9
    • A
      perf thread-stack: Eliminate code duplicating thread_stack__pop_ks() · eb5d8544
      Adrian Hunter 提交于
      Use new function thread_stack__pop_ks() in place of equivalent code.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20190619064429.14940-3-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      eb5d8544
    • A
      perf thread-stack: Fix thread stack return from kernel for kernel-only case · 97860b48
      Adrian Hunter 提交于
      Commit f08046cb ("perf thread-stack: Represent jmps to the start of a
      different symbol") had the side-effect of introducing more stack entries
      before return from kernel space.
      
      When user space is also traced, those entries are popped before entry to
      user space, but when user space is not traced, they get stuck at the
      bottom of the stack, making the stack grow progressively larger.
      
      Fix by detecting a return-from-kernel branch type, and popping kernel
      addresses from the stack then.
      
      Note, the problem and fix affect the exported Call Graph / Tree but not
      the callindent option used by "perf script --call-trace".
      
      Example:
      
        perf-with-kcore record example -e intel_pt//k -- ls
        perf-with-kcore script example --itrace=bep -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py example.db branches calls
        ~/libexec/perf-core/scripts/python/exported-sql-viewer.py example.db
      
        Menu option: Reports -> Context-Sensitive Call Graph
      
        Before: (showing Call Path column only)
      
          Call Path
           perf
          ▼ ls
            ▼ 12111:12111
               setup_new_exec
               __task_pid_nr_ns
               perf_event_pid_type
               perf_event_comm_output
               perf_iterate_ctx
               perf_iterate_sb
               perf_event_comm
               __set_task_comm
               load_elf_binary
               search_binary_handler
               __do_execve_file.isra.41
               __x64_sys_execve
               do_syscall_64
              ▼ entry_SYSCALL_64_after_hwframe
                ▼ swapgs_restore_regs_and_return_to_usermode
                  ▼ native_iret
                     error_entry
                     do_page_fault
                    ▼ error_exit
                      ▼ retint_user
                         prepare_exit_to_usermode
                        ▼ native_iret
                           error_entry
                           do_page_fault
                          ▼ error_exit
                            ▼ retint_user
                               prepare_exit_to_usermode
                              ▼ native_iret
                                 error_entry
                                 do_page_fault
                                ▼ error_exit
                                  ▼ retint_user
                                     prepare_exit_to_usermode
                                     native_iret
      
        After: (showing Call Path column only)
      
          Call Path
           perf
          ▼ ls
            ▼ 12111:12111
               setup_new_exec
               __task_pid_nr_ns
               perf_event_pid_type
               perf_event_comm_output
               perf_iterate_ctx
               perf_iterate_sb
               perf_event_comm
               __set_task_comm
               load_elf_binary
               search_binary_handler
               __do_execve_file.isra.41
               __x64_sys_execve
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               page_fault
              ▼ entry_SYSCALL_64
                ▼ do_syscall_64
                   __x64_sys_brk
                   __x64_sys_access
                   __x64_sys_openat
                   __x64_sys_newfstat
                   __x64_sys_mmap
                   __x64_sys_close
                   __x64_sys_read
                   __x64_sys_mprotect
                   __x64_sys_arch_prctl
                   __x64_sys_munmap
                   exit_to_usermode_loop
                   __x64_sys_set_tid_address
                   __x64_sys_set_robust_list
                   __x64_sys_rt_sigaction
                   __x64_sys_rt_sigprocmask
                   __x64_sys_prlimit64
                   __x64_sys_statfs
                   __x64_sys_ioctl
                   __x64_sys_getdents64
                   __x64_sys_write
                   __x64_sys_exit_group
      
      Committer notes:
      
      The first arg to the perf-with-kcore needs to be the same for the
      'record' and 'script' lines, otherwise we'll record the perf.data file
      and kcore_dir/ files in one directory ('example') to then try to use it
      from the 'bep' directory, fix the instructions above it so that both use
      'example'.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: f08046cb ("perf thread-stack: Represent jmps to the start of a different symbol")
      Link: http://lkml.kernel.org/r/20190619064429.14940-2-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      97860b48
    • N
      perf tools: Fix cache.h include directive · 2d7102a0
      Numfor Mbiziwo-Tiapo 提交于
      Change the include path so that progress.c can find cache.h since it was
      previously searching in the wrong directory.
      
      Committer notes:
      
        $ ls -la tools/perf/ui/../cache.h
        ls: cannot access 'tools/perf/ui/../cache.h': No such file or directory
      
      So it really should include ../../util/cache.h, or plain cache.h, since
      we have -Iutil in INC_FLAGS in tools/perf/Makefile.config
      Signed-off-by: NNumfor Mbiziwo-Tiapo <nums@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>,
      Cc: Luke Mujica <lukemujica@google.com>,
      Cc: Stephane Eranian <eranian@google.com>
      To: Ian Rogers <irogers@google.com>
      Link: https://lkml.kernel.org/n/tip-pud8usyutvd2npg2vpsygncz@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2d7102a0
    • I
      perf/cgroups: Don't rotate events for cgroups unnecessarily · fd7d5517
      Ian Rogers 提交于
      Currently perf_rotate_context assumes that if the context's nr_events !=
      nr_active a rotation is necessary for perf event multiplexing. With
      cgroups, nr_events is the total count of events for all cgroups and
      nr_active will not include events in a cgroup other than the current
      task's. This makes rotation appear necessary for cgroups when it is not.
      
      Add a perf_event_context flag that is set when rotation is necessary.
      Clear the flag during sched_out and set it when a flexible sched_in
      fails due to resources.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190601082722.44543-1-irogers@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fd7d5517
    • J
      perf/x86/rapl: Get quirk state from new probe framework · 637d97b5
      Jiri Olsa 提交于
      Getting the apply_quirk bool from new rapl_model_match array.
      
      And because apply_quirk was the last remaining piece of data
      in rapl_cpu_match, replacing it with rapl_model_match as device
      table.
      
      The switch to new perf_msr_probe detection API is done.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan <kan.liang@linux.intel.com>
      Cc: Liang
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190616140358.27799-9-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      637d97b5
    • J
      perf/x86/rapl: Get attributes from new probe framework · 5fc1bd84
      Jiri Olsa 提交于
      We no longer need model specific attribute arrays,
      because we get all this detected in rapl_events_attrs.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan <kan.liang@linux.intel.com>
      Cc: Liang
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190616140358.27799-8-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5fc1bd84
    • J
      perf/x86/rapl: Get MSR values from new probe framework · 122f1c51
      Jiri Olsa 提交于
      There's no need to have special code for getting
      the bit and MSR value for given event. We can
      now easily get it from rapl_msrs array.
      
      Also getting rid of RAPL_IDX_*, which is no longer
      needed and replacing INTEL_RAPL* with PERF_RAPL*
      enums.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan <kan.liang@linux.intel.com>
      Cc: Liang
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190616140358.27799-7-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      122f1c51
    • J
      perf/x86/rapl: Get rapl_cntr_mask from new probe framework · cd105aed
      Jiri Olsa 提交于
      We get rapl_cntr_mask from perf_msr_probe call, as a replacement
      for current intel_rapl_init_fun::cntr_mask value for each model.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan <kan.liang@linux.intel.com>
      Cc: Liang
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190616140358.27799-6-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cd105aed
    • J
      perf/x86/rapl: Use new MSR detection interface · 5fb5273a
      Jiri Olsa 提交于
      Using perf_msr_probe function to probe for RAPL MSRs.
      
      Adding new rapl_model_match device table, that
      gathers events info for given model, following
      the MSR and cstate module design.
      
      It will replace the current rapl_cpu_match device
      table and detection code in following patches.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan <kan.liang@linux.intel.com>
      Cc: Liang
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190616140358.27799-5-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5fb5273a
    • J
      perf/x86/cstate: Use new probe function · 8f2a28c5
      Jiri Olsa 提交于
      Using perf_msr_probe function to probe for cstate events.
      
      The functionality is the same, with one exception, that
      perf_msr_probe checks for rdmsr to return value != 0 for
      given MSR register.
      
      Using the new attribute groups and adding the events via
      pmu::attr_update.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan <kan.liang@linux.intel.com>
      Cc: Liang
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190616140358.27799-4-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8f2a28c5
    • J
      perf/x86/msr: Use new probe function · dde5e720
      Jiri Olsa 提交于
      Using perf_msr_probe function to probe for msr events.
      
      The functionality is the same, with one exception, that
      perf_msr_probe checks for rdmsr to return value != 0 for
      given MSR register.
      
      Using the new attribute groups and adding the events via
      pmu::attr_update.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan <kan.liang@linux.intel.com>
      Cc: Liang
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190616140358.27799-3-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dde5e720
    • J
      perf/x86: Add MSR probe interface · 98253a54
      Jiri Olsa 提交于
      Adding perf_msr_probe function to provide interface for
      checking up on MSR register and set the related attribute
      group visibility.
      
      User defines following struct for each MSR register:
      
        struct perf_msr {
             u64                       msr;
             struct attribute_group   *grp;
             bool                    (*test)(int idx, void *data);
             bool                      no_check;
        };
      
      Where:
        msr      - is the MSR address
        attrs    - is attribute groups array to add if the check passed
        test     - is test function pointer
        no_check - is bool that bypass the check and adds the
                    attribute without any test
      
      The array of struct perf_msr is passed into:
      
        perf_msr_probe(struct perf_msr *msr, int cnt, bool zero, void *data)
      
      Together with:
        cnt  - which is the number of struct msr array elements
        data - which is user pointer passed to the test function
        zero - allow counters that returns zero on rdmsr
      
      The perf_msr_probe will executed test code, read the MSR and
      check the value is != 0. If all these tests pass, related
      attribute group is kept visible.
      
      Also adding PMU_EVENT_GROUP macro helper to define attribute
      group for single attribute. It will be used in following patches.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan <kan.liang@linux.intel.com>
      Cc: Liang
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190616140358.27799-2-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      98253a54
    • I
      9e6e87e6
    • I
      Merge tag 'v5.2-rc6' into perf/core, to refresh branch · b9271f0c
      Ingo Molnar 提交于
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b9271f0c
  2. 24 6月, 2019 5 次提交
    • F
      Documentation/ABI: Document umwait control sysfs interfaces · 203dffac
      Fenghua Yu 提交于
      Since two new sysfs interface files are created for umwait control, add
      an ABI document entry for the files:
      
         /sys/devices/system/cpu/umwait_control/enable_c02
         /sys/devices/system/cpu/umwait_control/max_time
      
      [ tglx: Made the write value instructions readable ]
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAshok Raj <ashok.raj@intel.com>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Andy Lutomirski" <luto@kernel.org>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-6-git-send-email-fenghua.yu@intel.com
      203dffac
    • F
      x86/umwait: Add sysfs interface to control umwait maximum time · bd9a0c97
      Fenghua Yu 提交于
      IA32_UMWAIT_CONTROL[31:2] determines the maximum time in TSC-quanta
      that processor can stay in C0.1 or C0.2. A zero value means no maximum
      time.
      
      Each instruction sets its own deadline in the instruction's implicit
      input EDX:EAX value. The instruction wakes up if the time-stamp counter
      reaches or exceeds the specified deadline, or the umwait maximum time
      expires, or a store happens in the monitored address range in umwait.
      
      The administrator can write an unsigned 32-bit number to
      /sys/devices/system/cpu/umwait_control/max_time to change the default
      value. Note that a value of zero means there is no limit. The lower two
      bits of the value must be zero.
      
      [ tglx: Simplify the write function. Massage changelog ]
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAshok Raj <ashok.raj@intel.com>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Andy Lutomirski" <luto@kernel.org>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-5-git-send-email-fenghua.yu@intel.com
      bd9a0c97
    • F
      x86/umwait: Add sysfs interface to control umwait C0.2 state · ff4b353f
      Fenghua Yu 提交于
      C0.2 state in umwait and tpause instructions can be enabled or disabled
      on a processor through IA32_UMWAIT_CONTROL MSR register.
      
      By default, C0.2 is enabled and the user wait instructions results in
      lower power consumption with slower wakeup time.
      
      But in real time systems which require faster wakeup time although power
      savings could be smaller, the administrator needs to disable C0.2 and all
      umwait invocations from user applications use C0.1.
      
      Create a sysfs interface which allows the administrator to control C0.2
      state during run time.
      
      Andy Lutomirski suggested to turn off local irqs before writing the MSR to
      ensure the cached control value is not changed by a concurrent sysfs write
      from a different CPU via IPI.
      
      [ tglx: Simplified the update logic in the write function and got rid of
        	all the convoluted type casts. Added a shared update function and
      	made the namespace consistent. Moved the sysfs create invocation.
      	Massaged changelog ]
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAshok Raj <ashok.raj@intel.com>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Andy Lutomirski" <luto@kernel.org>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-4-git-send-email-fenghua.yu@intel.com
      ff4b353f
    • F
      x86/umwait: Initialize umwait control values · bd688c69
      Fenghua Yu 提交于
      umwait or tpause allows the processor to enter a light-weight
      power/performance optimized state (C0.1 state) or an improved
      power/performance optimized state (C0.2 state) for a period specified by
      the instruction or until the system time limit or until a store to the
      monitored address range in umwait.
      
      IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on
      the processor and to set the maximum time the processor can reside in C0.1
      or C0.2.
      
      By default C0.2 is enabled so the user wait instructions can enter the
      C0.2 state to save more power with slower wakeup time.
      
      Andy Lutomirski proposed to set the maximum umwait time to 100000 cycles by
      default. A quote from Andy:
      
        "What I want to avoid is the case where it works dramatically differently
         on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may
         behave a bit differently if the max timeout is hit, and I'd like that
         path to get exercised widely by making it happen even on default
         configs."
      
      A sysfs interface to adjust the time and the C0.2 enablement is provided in
      a follow up change.
      
      [ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to
        	MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as
        	mask throughout the code.
      	Massaged comments and changelog ]
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAshok Raj <ashok.raj@intel.com>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua.yu@intel.com
      bd688c69
    • F
      x86/cpufeatures: Enumerate user wait instructions · 6dbbf5ec
      Fenghua Yu 提交于
      umonitor, umwait, and tpause are a set of user wait instructions.
      
      umonitor arms address monitoring hardware using an address. The
      address range is determined by using CPUID.0x5. A store to
      an address within the specified address range triggers the
      monitoring hardware to wake up the processor waiting in umwait.
      
      umwait instructs the processor to enter an implementation-dependent
      optimized state while monitoring a range of addresses. The optimized
      state may be either a light-weight power/performance optimized state
      (C0.1 state) or an improved power/performance optimized state
      (C0.2 state).
      
      tpause instructs the processor to enter an implementation-dependent
      optimized state C0.1 or C0.2 state and wake up when time-stamp counter
      reaches specified timeout.
      
      The three instructions may be executed at any privilege level.
      
      The instructions provide power saving method while waiting in
      user space. Additionally, they can allow a sibling hyperthread to
      make faster progress while this thread is waiting. One example of an
      application usage of umwait is when waiting for input data from another
      application, such as a user level multi-threaded packet processing
      engine.
      
      Availability of the user wait instructions is indicated by the presence
      of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5].
      
      Detailed information on the instructions and CPUID feature WAITPKG flag
      can be found in the latest Intel Architecture Instruction Set Extensions
      and Future Features Programming Reference and Intel 64 and IA-32
      Architectures Software Developer's Manual.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAshok Raj <ashok.raj@intel.com>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-2-git-send-email-fenghua.yu@intel.com
      6dbbf5ec
  3. 23 6月, 2019 7 次提交
    • L
      Linux 5.2-rc6 · 4b972a01
      Linus Torvalds 提交于
      4b972a01
    • L
      Merge tag 'iommu-fix-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 6698a71a
      Linus Torvalds 提交于
      Pull iommu fix from Joerg Roedel:
       "Revert a commit from the previous pile of fixes which causes new
        lockdep splats. It is better to revert it for now and work on a better
        and more well tested fix"
      
      * tag 'iommu-fix-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock"
      6698a71a
    • P
      Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock" · 0aafc8ae
      Peter Xu 提交于
      This reverts commit 7560cc3c.
      
      With 5.2.0-rc5 I can easily trigger this with lockdep and iommu=pt:
      
          ======================================================
          WARNING: possible circular locking dependency detected
          5.2.0-rc5 #78 Not tainted
          ------------------------------------------------------
          swapper/0/1 is trying to acquire lock:
          00000000ea2b3beb (&(&iommu->lock)->rlock){+.+.}, at: domain_context_mapping_one+0xa5/0x4e0
          but task is already holding lock:
          00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0
          which lock already depends on the new lock.
          the existing dependency chain (in reverse order) is:
          -> #1 (device_domain_lock){....}:
                 _raw_spin_lock_irqsave+0x3c/0x50
                 dmar_insert_one_dev_info+0xbb/0x510
                 domain_add_dev_info+0x50/0x90
                 dev_prepare_static_identity_mapping+0x30/0x68
                 intel_iommu_init+0xddd/0x1422
                 pci_iommu_init+0x16/0x3f
                 do_one_initcall+0x5d/0x2b4
                 kernel_init_freeable+0x218/0x2c1
                 kernel_init+0xa/0x100
                 ret_from_fork+0x3a/0x50
          -> #0 (&(&iommu->lock)->rlock){+.+.}:
                 lock_acquire+0x9e/0x170
                 _raw_spin_lock+0x25/0x30
                 domain_context_mapping_one+0xa5/0x4e0
                 pci_for_each_dma_alias+0x30/0x140
                 dmar_insert_one_dev_info+0x3b2/0x510
                 domain_add_dev_info+0x50/0x90
                 dev_prepare_static_identity_mapping+0x30/0x68
                 intel_iommu_init+0xddd/0x1422
                 pci_iommu_init+0x16/0x3f
                 do_one_initcall+0x5d/0x2b4
                 kernel_init_freeable+0x218/0x2c1
                 kernel_init+0xa/0x100
                 ret_from_fork+0x3a/0x50
      
          other info that might help us debug this:
           Possible unsafe locking scenario:
                 CPU0                    CPU1
                 ----                    ----
            lock(device_domain_lock);
                                         lock(&(&iommu->lock)->rlock);
                                         lock(device_domain_lock);
            lock(&(&iommu->lock)->rlock);
      
           *** DEADLOCK ***
          2 locks held by swapper/0/1:
           #0: 00000000033eb13d (dmar_global_lock){++++}, at: intel_iommu_init+0x1e0/0x1422
           #1: 00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0
      
          stack backtrace:
          CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5 #78
          Hardware name: LENOVO 20KGS35G01/20KGS35G01, BIOS N23ET50W (1.25 ) 06/25/2018
          Call Trace:
           dump_stack+0x85/0xc0
           print_circular_bug.cold.57+0x15c/0x195
           __lock_acquire+0x152a/0x1710
           lock_acquire+0x9e/0x170
           ? domain_context_mapping_one+0xa5/0x4e0
           _raw_spin_lock+0x25/0x30
           ? domain_context_mapping_one+0xa5/0x4e0
           domain_context_mapping_one+0xa5/0x4e0
           ? domain_context_mapping_one+0x4e0/0x4e0
           pci_for_each_dma_alias+0x30/0x140
           dmar_insert_one_dev_info+0x3b2/0x510
           domain_add_dev_info+0x50/0x90
           dev_prepare_static_identity_mapping+0x30/0x68
           intel_iommu_init+0xddd/0x1422
           ? printk+0x58/0x6f
           ? lockdep_hardirqs_on+0xf0/0x180
           ? do_early_param+0x8e/0x8e
           ? e820__memblock_setup+0x63/0x63
           pci_iommu_init+0x16/0x3f
           do_one_initcall+0x5d/0x2b4
           ? do_early_param+0x8e/0x8e
           ? rcu_read_lock_sched_held+0x55/0x60
           ? do_early_param+0x8e/0x8e
           kernel_init_freeable+0x218/0x2c1
           ? rest_init+0x230/0x230
           kernel_init+0xa/0x100
           ret_from_fork+0x3a/0x50
      
      domain_context_mapping_one() is taking device_domain_lock first then
      iommu lock, while dmar_insert_one_dev_info() is doing the reverse.
      
      That should be introduced by commit:
      
      7560cc3c ("iommu/vt-d: Fix lock inversion between iommu->lock and
                    device_domain_lock", 2019-05-27)
      
      So far I still cannot figure out how the previous deadlock was
      triggered (I cannot find iommu lock taken before calling of
      iommu_flush_dev_iotlb()), however I'm pretty sure that that change
      should be incomplete at least because it does not fix all the places
      so we're still taking the locks in different orders, while reverting
      that commit is very clean to me so far that we should always take
      device_domain_lock first then the iommu lock.
      
      We can continue to try to find the real culprit mentioned in
      7560cc3c, but for now I think we should revert it to fix current
      breakage.
      
      CC: Joerg Roedel <joro@8bytes.org>
      CC: Lu Baolu <baolu.lu@linux.intel.com>
      CC: dave.jiang@intel.com
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      0aafc8ae
    • L
      Merge tag 'pci-v5.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · b253d5f3
      Linus Torvalds 提交于
      Pull PCI fix from Bjorn Helgaas:
       "If an IOMMU is present, ignore the P2PDMA whitelist we added for v5.2
        because we don't yet know how to support P2PDMA in that case (Logan
        Gunthorpe)"
      
      * tag 'pci-v5.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI/P2PDMA: Ignore root complex whitelist when an IOMMU is present
      b253d5f3
    • L
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · f4102766
      Linus Torvalds 提交于
      Pull SCSI fixes from James Bottomley:
       "Three driver fixes (and one version number update): a suspend hang in
        ufs, a qla hard lock on module removal and a qedi panic during
        discovery"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: qla2xxx: Fix hardlockup in abort command during driver remove
        scsi: ufs: Avoid runtime suspend possibly being blocked forever
        scsi: qedi: update driver version to 8.37.0.20
        scsi: qedi: Check targetname while finding boot target information
      f4102766
    • L
      Merge tag 'powerpc-5.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · a8282bf0
      Linus Torvalds 提交于
      Pull powerpc fixes from Michael Ellerman:
       "This is a frustratingly large batch at rc5. Some of these were sent
        earlier but were missed by me due to being distracted by other things,
        and some took a while to track down due to needing manual bisection on
        old hardware. But still we clearly need to improve our testing of KVM,
        and of 32-bit, so that we catch these earlier.
      
        Summary: seven fixes, all for bugs introduced this cycle.
      
         - The commit to add KASAN support broke booting on 32-bit SMP
           machines, due to a refactoring that moved some setup out of the
           secondary CPU path.
      
         - A fix for another 32-bit SMP bug introduced by the fast syscall
           entry implementation for 32-bit BOOKE. And a build fix for the same
           commit.
      
         - Our change to allow the DAWR to be force enabled on Power9
           introduced a bug in KVM, where we clobber r3 leading to a host
           crash.
      
         - The same commit also exposed a previously unreachable bug in the
           nested KVM handling of DAWR, which could lead to an oops in a
           nested host.
      
         - One of the DMA reworks broke the b43legacy WiFi driver on some
           people's powermacs, fix it by enabling a 30-bit ZONE_DMA on 32-bit.
      
         - A fix for TLB flushing in KVM introduced a new bug, as it neglected
           to also flush the ERAT, this could lead to memory corruption in the
           guest.
      
        Thanks to: Aaro Koskinen, Christoph Hellwig, Christophe Leroy, Larry
        Finger, Michael Neuling, Suraj Jitindar Singh"
      
      * tag 'powerpc-5.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries
        powerpc: enable a 30-bit ZONE_DMA for 32-bit pmac
        KVM: PPC: Book3S HV: Only write DAWR[X] when handling h_set_dawr in real mode
        KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr()
        powerpc/32: fix build failure on book3e with KVM
        powerpc/booke: fix fast syscall entry on SMP
        powerpc/32s: fix initial setup of segment registers on secondary CPU
      a8282bf0
    • M
      Bluetooth: Fix regression with minimum encryption key size alignment · 693cd8ce
      Marcel Holtmann 提交于
      When trying to align the minimum encryption key size requirement for
      Bluetooth connections, it turns out doing this in a central location in
      the HCI connection handling code is not possible.
      
      Original Bluetooth version up to 2.0 used a security model where the
      L2CAP service would enforce authentication and encryption.  Starting
      with Bluetooth 2.1 and Secure Simple Pairing that model has changed into
      that the connection initiator is responsible for providing an encrypted
      ACL link before any L2CAP communication can happen.
      
      Now connecting Bluetooth 2.1 or later devices with Bluetooth 2.0 and
      before devices are causing a regression.  The encryption key size check
      needs to be moved out of the HCI connection handling into the L2CAP
      channel setup.
      
      To achieve this, the current check inside hci_conn_security() has been
      moved into l2cap_check_enc_key_size() helper function and then called
      from four decisions point inside L2CAP to cover all combinations of
      Secure Simple Pairing enabled devices and device using legacy pairing
      and legacy service security model.
      
      Fixes: d5bb334a ("Bluetooth: Align minimum encryption key size for LE and BR/EDR connections")
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203643Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      693cd8ce
  4. 22 6月, 2019 10 次提交