1. 02 4月, 2015 16 次提交
  2. 30 3月, 2015 1 次提交
  3. 27 3月, 2015 6 次提交
    • P
      perf: Add per event clockid support · 34f43927
      Peter Zijlstra 提交于
      While thinking on the whole clock discussion it occurred to me we have
      two distinct uses of time:
      
       1) the tracking of event/ctx/cgroup enabled/running/stopped times
          which includes the self-monitoring support in struct
          perf_event_mmap_page.
      
       2) the actual timestamps visible in the data records.
      
      And we've been conflating them.
      
      The first is all about tracking time deltas, nobody should really care
      in what time base that happens, its all relative information, as long
      as its internally consistent it works.
      
      The second however is what people are worried about when having to
      merge their data with external sources. And here we have the
      discussion on MONOTONIC vs MONOTONIC_RAW etc..
      
      Where MONOTONIC is good for correlating between machines (static
      offset), MONOTNIC_RAW is required for correlating against a fixed rate
      hardware clock.
      
      This means configurability; now 1) makes that hard because it needs to
      be internally consistent across groups of unrelated events; which is
      why we had to have a global perf_clock().
      
      However, for 2) it doesn't really matter, perf itself doesn't care
      what it writes into the buffer.
      
      The below patch makes the distinction between these two cases by
      adding perf_event_clock() which is used for the second case. It
      further makes this configurable on a per-event basis, but adds a few
      sanity checks such that we cannot combine events with different clocks
      in confusing ways.
      
      And since we then have per-event configurability we might as well
      retain the 'legacy' behaviour as a default.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      34f43927
    • D
      perf/x86: Remove redundant calls to perf_pmu_{dis|en}able() · 9332d250
      David Ahern 提交于
      perf_pmu_disable() is called before pmu->add() and perf_pmu_enable() is called
      afterwards. No need to call these inside of x86_pmu_add() as well.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1424281543-67335-1-git-send-email-dsahern@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9332d250
    • P
      time: Rename timekeeper::tkr to timekeeper::tkr_mono · 876e7881
      Peter Zijlstra 提交于
      In preparation of adding another tkr field, rename this one to
      tkr_mono. Also rename tk_read_base::base_mono to tk_read_base::base,
      since the structure is not specific to CLOCK_MONOTONIC and the mono
      name got added to the tk_read_base instance.
      
      Lots of trivial churn.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150319093400.344679419@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      876e7881
    • A
      perf/x86/intel: Add INST_RETIRED.ALL workarounds · 294fe0f5
      Andi Kleen 提交于
      On Broadwell INST_RETIRED.ALL cannot be used with any period
      that doesn't have the lowest 6 bits cleared. And the period
      should not be smaller than 128.
      
      This is erratum BDM11 and BDM55:
      
        http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5th-gen-core-family-spec-update.pdf
      
      BDM11: When using a period < 100; we may get incorrect PEBS/PMI
      interrupts and/or an invalid counter state.
      BDM55: When bit0-5 of the period are !0 we may get redundant PEBS
      records on overflow.
      
      Add a new callback to enforce this, and set it for Broadwell.
      
      How does this handle the case when an app requests a specific
      period with some of the bottom bits set?
      
      Short answer:
      
      Any useful instruction sampling period needs to be 4-6 orders
      of magnitude larger than 128, as an PMI every 128 instructions
      would instantly overwhelm the system and be throttled.
      So the +-64 error from this is really small compared to the
      period, much smaller than normal system jitter.
      
      Long answer (by Peterz):
      
      IFF we guarantee perf_event_attr::sample_period >= 128.
      
      Suppose we start out with sample_period=192; then we'll set period_left
      to 192, we'll end up with left = 128 (we truncate the lower bits). We
      get an interrupt, find that period_left = 64 (>0 so we return 0 and
      don't get an overflow handler), up that to 128. Then we trigger again,
      at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get
      an overflow). We increment with sample_period so we get left = 128. We
      fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an
      overflow). And on and on.
      
      So while the individual interrupts are 'wrong' we get then with
      interval=256,128 in exactly the right ratio to average out at 192. And
      this works for everything >=128.
      
      So the num_samples*fixed_period thing is still entirely correct +- 127,
      which is good enough I'd say, as you already have that error anyhow.
      
      So no need to 'fix' the tools, al we need to do is refuse to create
      INST_RETIRED:ALL events with sample_period < 128.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      [ Updated comments and changelog a bit. ]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1424225886-18652-3-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      294fe0f5
    • A
      perf/x86/intel: Add Broadwell core support · 91f1b705
      Andi Kleen 提交于
      Add Broadwell support for Broadwell to perf.
      
      The basic support is very similar to Haswell. We use the new cache
      event list added for Haswell earlier. The only differences
      are a few bits related to remote nodes. To avoid an extra,
      mostly identical, table these are patched up in the initialization code.
      
      The constraint list has one new event that needs to be handled over Haswell.
      
      Includes code and testing from Kan Liang.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1424225886-18652-2-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      91f1b705
    • A
      perf/x86/intel: Add new cache events table for Haswell · 0f1b5ca2
      Andi Kleen 提交于
      Haswell offcore events are quite different from Sandy Bridge.
      Add a new table to handle Haswell properly.
      
      Note that the offcore bits listed in the SDM are not quite correct
      (this is currently being fixed). An uptodate list of bits is
      in the patch.
      
      The basic setup is similar to Sandy Bridge. The prefetch columns
      have been removed, as prefetch counting is not very reliable
      on Haswell. One L1 event that is not in the event list anymore
      has been also removed.
      
      - data reads do not include code reads (comparable to earlier Sandy Bridge tables)
      - data counts include speculative execution (except L1 write, dtlb, bpu)
      - remote node access includes both remote memory, remote cache, remote mmio.
      - prefetches are not included in the counts for consistency
        (different from Sandy Bridge, which includes prefetches in the remote node)
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      [ Removed the HSM30 comments; we don't have them for SNB/IVB either. ]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1424225886-18652-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0f1b5ca2
  4. 26 3月, 2015 2 次提交
    • V
      ARC: signal handling robustify · e4140819
      Vineet Gupta 提交于
      A malicious signal handler / restorer can DOS the system by fudging the
      user regs saved on stack, causing weird things such as sigreturn returning
      to user mode PC but cpu state still being kernel mode....
      
      Ensure that in sigreturn path status32 always has U bit; any other bogosity
      (gargbage PC etc) will be taken care of by normal user mode exceptions mechanisms.
      
      Reproducer signal handler:
      
          void handle_sig(int signo, siginfo_t *info, void *context)
          {
      	ucontext_t *uc = context;
      	struct user_regs_struct *regs = &(uc->uc_mcontext.regs);
      
      	regs->scratch.status32 = 0;
          }
      
      Before the fix, kernel would go off to weeds like below:
      
          --------->8-----------
          [ARCLinux]$ ./signal-test
          Path: /signal-test
          CPU: 0 PID: 61 Comm: signal-test Not tainted 4.0.0-rc5+ #65
          task: 8f177880 ti: 5ffe6000 task.ti: 8f15c000
      
          [ECR   ]: 0x00220200 => Invalid Write @ 0x00000010 by insn @ 0x00010698
          [EFA   ]: 0x00000010
          [BLINK ]: 0x2007c1ee
          [ERET  ]: 0x10698
          [STAT32]: 0x00000000 :                                   <--------
          BTA: 0x00010680	 SP: 0x5ffe7e48	 FP: 0x00000000
          LPS: 0x20003c6c	LPE: 0x20003c70	LPC: 0x00000000
          ...
          --------->8-----------
      Reported-by: NAlexey Brodkin <abrodkin@synopsys.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      e4140819
    • V
      ARC: SA_SIGINFO ucontext regs off-by-one · 6914e1e3
      Vineet Gupta 提交于
      The regfile provided to SA_SIGINFO signal handler as ucontext was off by
      one due to pt_regs gutter cleanups in 2013.
      
      Before handling signal, user pt_regs are copied onto user_regs_struct and copied
      back later. Both structs are binary compatible. This was all fine until
      commit 2fa91904 (ARC: pt_regs update #2) which removed the empty stack slot
      at top of pt_regs (corresponding to first pad) and made the corresponding
      fixup in struct user_regs_struct (the pad in there was moved out of
      @scratch - not removed altogether as it is part of ptrace ABI)
      
       struct user_regs_struct {
      +       long pad;
              struct {
      -               long pad;
                      long bta, lp_start, lp_end,....
              } scratch;
       ...
       }
      
      This meant that now user_regs_struct was off by 1 reg w.r.t pt_regs and
      signal code needs to user_regs_struct.scratch to reflect it as pt_regs,
      which is what this commit does.
      
      This problem was hidden for 2 years, because both save/restore, despite
      using wrong location, were using the same location. Only an interim
      inspection (reproducer below) exposed the issue.
      
           void handle_segv(int signo, siginfo_t *info, void *context)
           {
       	ucontext_t *uc = context;
      	struct user_regs_struct *regs = &(uc->uc_mcontext.regs);
      
      	printf("regs %x %x\n",               <=== prints 7 8 (vs. 8 9)
                     regs->scratch.r8, regs->scratch.r9);
           }
      
           int main()
           {
      	struct sigaction sa;
      
      	sa.sa_sigaction = handle_segv;
      	sa.sa_flags = SA_SIGINFO;
      	sigemptyset(&sa.sa_mask);
      	sigaction(SIGSEGV, &sa, NULL);
      
      	asm volatile(
      	"mov	r7, 7	\n"
      	"mov	r8, 8	\n"
      	"mov	r9, 9	\n"
      	"mov	r10, 10	\n"
      	:::"r7","r8","r9","r10");
      
      	*((unsigned int*)0x10) = 0;
           }
      
      Fixes: 2fa91904 "ARC: pt_regs update #2: Remove unused gutter at start of pt_regs"
      CC: <stable@vger.kernel.org>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      6914e1e3
  5. 25 3月, 2015 5 次提交
  6. 24 3月, 2015 3 次提交
  7. 23 3月, 2015 7 次提交