1. 16 4月, 2019 2 次提交
    • K
      perf/x86: Support outputting XMM registers · 878068ea
      Kan Liang 提交于
      Starting from Icelake, XMM registers can be collected in PEBS record.
      But current code only output the pt_regs.
      
      Add a new struct x86_perf_regs for both pt_regs and xmm_regs. The
      xmm_regs will be used later to keep a pointer to PEBS record which has
      XMM information.
      
      XMM registers are 128 bit. To simplify the code, they are handled like
      two different registers, which means setting two bits in the register
      bitmap. This also allows only sampling the lower 64bit bits in XMM.
      
      The index of XMM registers starts from 32. There are 16 XMM registers.
      So all reserved space for regs are used. Remove REG_RESERVED.
      
      Add PERF_REG_X86_XMM_MAX, which stands for the max number of all x86
      regs including both GPRs and XMM.
      
      Add REG_NOSUPPORT for 32bit to exclude unsupported registers.
      
      Previous platforms can not collect XMM information in PEBS record.
      Adding pebs_no_xmm_regs to indicate the unsupported platforms.
      
      The common code still validates the supported registers. However, it
      cannot check model specific registers, e.g. XMM. Add extra check in
      x86_pmu_hw_config() to reject invalid config of regs_user and regs_intr.
      The regs_user never supports XMM collection.
      The regs_intr only supports XMM collection when sampling PEBS event on
      icelake and later platforms.
      Originally-by: NAndi Kleen <ak@linux.intel.com>
      Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: https://lkml.kernel.org/r/20190402194509.2832-3-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      878068ea
    • S
      perf/x86/intel: Force resched when TFA sysctl is modified · f447e4eb
      Stephane Eranian 提交于
      This patch provides guarantee to the sysadmin that when TFA is disabled, no PMU
      event is using PMC3 when the echo command returns. Vice-Versa, when TFA
      is enabled, PMU can use PMC3 immediately (to eliminate possible multiplexing).
      
        $ perf stat -a -I 1000 --no-merge -e branches,branches,branches,branches
           1.000123979    125,768,725,208      branches
           1.000562520    125,631,000,456      branches
           1.000942898    125,487,114,291      branches
           1.001333316    125,323,363,620      branches
           2.004721306    125,514,968,546      branches
           2.005114560    125,511,110,861      branches
           2.005482722    125,510,132,724      branches
           2.005851245    125,508,967,086      branches
           3.006323475    125,166,570,648      branches
           3.006709247    125,165,650,056      branches
           3.007086605    125,164,639,142      branches
           3.007459298    125,164,402,912      branches
           4.007922698    125,045,577,140      branches
           4.008310775    125,046,804,324      branches
           4.008670814    125,048,265,111      branches
           4.009039251    125,048,677,611      branches
           5.009503373    125,122,240,217      branches
           5.009897067    125,122,450,517      branches
      
      Then on another connection, sysadmin does:
      
        $ echo  1 >/sys/devices/cpu/allow_tsx_force_abort
      
      Then perf stat adjusts the events immediately:
      
           5.010286029    125,121,393,483      branches
           5.010646308    125,120,556,786      branches
           6.011113588    124,963,351,832      branches
           6.011510331    124,964,267,566      branches
           6.011889913    124,964,829,130      branches
           6.012262996    124,965,841,156      branches
           7.012708299    124,419,832,234      branches [79.69%]
           7.012847908    124,416,363,853      branches [79.73%]
           7.013225462    124,400,723,712      branches [79.73%]
           7.013598191    124,376,154,434      branches [79.70%]
           8.014089834    124,250,862,693      branches [74.98%]
           8.014481363    124,267,539,139      branches [74.94%]
           8.014856006    124,259,519,786      branches [74.98%]
           8.014980848    124,225,457,969      branches [75.04%]
           9.015464576    124,204,235,423      branches [75.03%]
           9.015858587    124,204,988,490      branches [75.04%]
           9.016243680    124,220,092,486      branches [74.99%]
           9.016620104    124,231,260,146      branches [74.94%]
      
      And vice-versa if the syadmin does:
      
        $ echo  0 >/sys/devices/cpu/allow_tsx_force_abort
      
      Events are again spread over the 4 counters:
      
          10.017096277    124,276,230,565      branches [74.96%]
          10.017237209    124,228,062,171      branches [75.03%]
          10.017478637    124,178,780,626      branches [75.03%]
          10.017853402    124,198,316,177      branches [75.03%]
          11.018334423    124,602,418,933      branches [85.40%]
          11.018722584    124,602,921,320      branches [85.42%]
          11.019095621    124,603,956,093      branches [85.42%]
          11.019467742    124,595,273,783      branches [85.42%]
          12.019945736    125,110,114,864      branches
          12.020330764    125,109,334,472      branches
          12.020688740    125,109,818,865      branches
          12.021054020    125,108,594,014      branches
          13.021516774    125,109,164,018      branches
          13.021903640    125,108,794,510      branches
          13.022270770    125,107,756,978      branches
          13.022630819    125,109,380,471      branches
          14.023114989    125,133,140,817      branches
          14.023501880    125,133,785,858      branches
          14.023868339    125,133,852,700      branches
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: kan.liang@intel.com
      Cc: nelson.dsouza@intel.com
      Cc: tonyj@suse.com
      Link: https://lkml.kernel.org/r/20190408173252.37932-3-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f447e4eb
  2. 10 4月, 2019 1 次提交
    • L
      x86/perf/amd: Remove need to check "running" bit in NMI handler · 3966c3fe
      Lendacky, Thomas 提交于
      Spurious interrupt support was added to perf in the following commit, almost
      a decade ago:
      
        63e6be6d ("perf, x86: Catch spurious interrupts after disabling counters")
      
      The two previous patches (resolving the race condition when disabling a
      PMC and NMI latency mitigation) allow for the removal of this older
      spurious interrupt support.
      
      Currently in x86_pmu_stop(), the bit for the PMC in the active_mask bitmap
      is cleared before disabling the PMC, which sets up a race condition. This
      race condition was mitigated by introducing the running bitmap. That race
      condition can be eliminated by first disabling the PMC, waiting for PMC
      reset on overflow and then clearing the bit for the PMC in the active_mask
      bitmap. The NMI handler will not re-enable a disabled counter.
      
      If x86_pmu_stop() is called from the perf NMI handler, the NMI latency
      mitigation support will guard against any unhandled NMI messages.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> # 4.14.x-
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/Message-ID:
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3966c3fe
  3. 03 4月, 2019 5 次提交
    • P
      perf/x86: Add sanity checks to x86_schedule_events() · f80deefa
      Peter Zijlstra 提交于
      By computing the 'committed' index earlier, we can use it to validate
      the cached constraint state.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f80deefa
    • P
      perf/x86: Optimize x86_schedule_events() · 109717de
      Peter Zijlstra 提交于
      Now that cpuc->event_constraint[] is retained, we can avoid calling
      get_event_constraints() over and over again.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      109717de
    • P
      perf/x86: Clear ->event_constraint[] on put · 2c9651c3
      Peter Zijlstra 提交于
      The current code unconditionally clears cpuc->event_constraint[i]
      before calling get_event_constraints(.idx=i). The only site that cares
      is intel_get_event_constraints() where the c1 load will always be
      NULL.
      
      However, always calling get_event_constraints() on all events is
      wastefull, most times it will return the exact same result. Therefore
      retain the logic in intel_get_event_constraints() and change the
      generic code to only clear the constraint on put.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NStephane Eranian <eranian@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2c9651c3
    • P
      perf/x86: Remove PERF_X86_EVENT_COMMITTED · 1f6a1e2d
      Peter Zijlstra 提交于
      The flag PERF_X86_EVENT_COMMITTED is used to find uncommitted events
      for which to call put_event_constraint() when scheduling fails.
      
      These are the newly added events to the list, and must form, per
      definition, the tail of cpuc->event_list[]. By computing the list
      index of the last successfull schedule, then iteration can start there
      and the flag is redundant.
      
      There are only 3 callers of x86_schedule_events(), notably:
      
       - x86_pmu_add()
       - x86_pmu_commit_txn()
       - validate_group()
      
      For x86_pmu_add(), cpuc->n_events isn't updated until after
      schedule_events() succeeds, therefore cpuc->n_events points to the
      desired index.
      
      For x86_pmu_commit_txn(), cpuc->n_events is updated, but we can
      trivially compute the desired value with cpuc->n_txn -- the number of
      events added in this transaction.
      
      For validate_group(), we can make the rule for x86_pmu_add() work by
      simply setting cpuc->n_events to 0 before calling schedule_events().
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NStephane Eranian <eranian@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1f6a1e2d
    • P
      perf/x86: Simplify x86_pmu.get_constraints() interface · 21d65555
      Peter Zijlstra 提交于
      There is a special case for validate_events() where we'll call
      x86_pmu.get_constraints(.idx=-1). It's purpose, up until recent, seems
      to be to avoid taking a previous constraint from
      cpuc->event_constraint[] in intel_get_event_constraints().
      
      (I could not find any other get_event_constraints() implementation
      using @idx)
      
      However, since that cpuc is freshly allocated, that array will in fact
      be initialized with NULL pointers, achieving the very same effect.
      
      Therefore remove this exception.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NStephane Eranian <eranian@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      21d65555
  4. 06 3月, 2019 1 次提交
  5. 11 2月, 2019 1 次提交
    • J
      perf/x86: Add check_period PMU callback · 81ec3f3c
      Jiri Olsa 提交于
      Vince (and later on Ravi) reported crashes in the BTS code during
      fuzzing with the following backtrace:
      
        general protection fault: 0000 [#1] SMP PTI
        ...
        RIP: 0010:perf_prepare_sample+0x8f/0x510
        ...
        Call Trace:
         <IRQ>
         ? intel_pmu_drain_bts_buffer+0x194/0x230
         intel_pmu_drain_bts_buffer+0x160/0x230
         ? tick_nohz_irq_exit+0x31/0x40
         ? smp_call_function_single_interrupt+0x48/0xe0
         ? call_function_single_interrupt+0xf/0x20
         ? call_function_single_interrupt+0xa/0x20
         ? x86_schedule_events+0x1a0/0x2f0
         ? x86_pmu_commit_txn+0xb4/0x100
         ? find_busiest_group+0x47/0x5d0
         ? perf_event_set_state.part.42+0x12/0x50
         ? perf_mux_hrtimer_restart+0x40/0xb0
         intel_pmu_disable_event+0xae/0x100
         ? intel_pmu_disable_event+0xae/0x100
         x86_pmu_stop+0x7a/0xb0
         x86_pmu_del+0x57/0x120
         event_sched_out.isra.101+0x83/0x180
         group_sched_out.part.103+0x57/0xe0
         ctx_sched_out+0x188/0x240
         ctx_resched+0xa8/0xd0
         __perf_event_enable+0x193/0x1e0
         event_function+0x8e/0xc0
         remote_function+0x41/0x50
         flush_smp_call_function_queue+0x68/0x100
         generic_smp_call_function_single_interrupt+0x13/0x30
         smp_call_function_single_interrupt+0x3e/0xe0
         call_function_single_interrupt+0xf/0x20
         </IRQ>
      
      The reason is that while event init code does several checks
      for BTS events and prevents several unwanted config bits for
      BTS event (like precise_ip), the PERF_EVENT_IOC_PERIOD allows
      to create BTS event without those checks being done.
      
      Following sequence will cause the crash:
      
      If we create an 'almost' BTS event with precise_ip and callchains,
      and it into a BTS event it will crash the perf_prepare_sample()
      function because precise_ip events are expected to come
      in with callchain data initialized, but that's not the
      case for intel_pmu_drain_bts_buffer() caller.
      
      Adding a check_period callback to be called before the period
      is changed via PERF_EVENT_IOC_PERIOD. It will deny the change
      if the event would become BTS. Plus adding also the limit_period
      check as well.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20190204123532.GA4794@kravaSigned-off-by: NIngo Molnar <mingo@kernel.org>
      81ec3f3c
  6. 22 11月, 2018 1 次提交
  7. 16 10月, 2018 1 次提交
  8. 29 9月, 2018 1 次提交
  9. 28 9月, 2018 1 次提交
  10. 10 9月, 2018 1 次提交
  11. 31 8月, 2018 1 次提交
  12. 13 6月, 2018 1 次提交
    • K
      treewide: kmalloc() -> kmalloc_array() · 6da2ec56
      Kees Cook 提交于
      The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
      patch replaces cases of:
      
              kmalloc(a * b, gfp)
      
      with:
              kmalloc_array(a * b, gfp)
      
      as well as handling cases of:
      
              kmalloc(a * b * c, gfp)
      
      with:
      
              kmalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kmalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kmalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The tools/ directory was manually excluded, since it has its own
      implementation of kmalloc().
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kmalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kmalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kmalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kmalloc
      + kmalloc_array
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kmalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(sizeof(THING) * C2, ...)
      |
        kmalloc(sizeof(TYPE) * C2, ...)
      |
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(C1 * C2, ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6da2ec56
  13. 05 5月, 2018 2 次提交
    • P
      perf/x86: Fix possible Spectre-v1 indexing for x86_pmu::event_map() · 46b1b577
      Peter Zijlstra 提交于
      > arch/x86/events/intel/cstate.c:307 cstate_pmu_event_init() warn: potential spectre issue 'pkg_msr' (local cap)
      > arch/x86/events/intel/core.c:337 intel_pmu_event_map() warn: potential spectre issue 'intel_perfmon_event_map'
      > arch/x86/events/intel/knc.c:122 knc_pmu_event_map() warn: potential spectre issue 'knc_perfmon_event_map'
      > arch/x86/events/intel/p4.c:722 p4_pmu_event_map() warn: potential spectre issue 'p4_general_events'
      > arch/x86/events/intel/p6.c:116 p6_pmu_event_map() warn: potential spectre issue 'p6_perfmon_event_map'
      > arch/x86/events/amd/core.c:132 amd_pmu_event_map() warn: potential spectre issue 'amd_perfmon_event_map'
      
      Userspace controls @attr, sanitize @attr->config before passing it on
      to x86_pmu::event_map().
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      46b1b577
    • P
      perf/x86: Fix possible Spectre-v1 indexing for hw_perf_event cache_* · ef9ee4ad
      Peter Zijlstra 提交于
      > arch/x86/events/core.c:319 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_event_ids[cache_type]' (local cap)
      > arch/x86/events/core.c:319 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_event_ids' (local cap)
      > arch/x86/events/core.c:328 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_extra_regs[cache_type]' (local cap)
      > arch/x86/events/core.c:328 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_extra_regs' (local cap)
      
      Userspace controls @config which contains 3 (byte) fields used for a 3
      dimensional array deref.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ef9ee4ad
  14. 19 4月, 2018 1 次提交
    • D
      compat: Move compat_timespec/ timeval to compat_time.h · 0d55303c
      Deepa Dinamani 提交于
      All the current architecture specific defines for these
      are the same. Refactor these common defines to a common
      header file.
      
      The new common linux/compat_time.h is also useful as it
      will eventually be used to hold all the defines that
      are needed for compat time types that support non y2038
      safe types. New architectures need not have to define these
      new types as they will only use new y2038 safe syscalls.
      This file can be deleted after y2038 when we stop supporting
      non y2038 safe syscalls.
      
      The patch also requires an operation similar to:
      
      git grep "asm/compat\.h" | cut -d ":" -f 1 |  xargs -n 1 sed -i -e "s%asm/compat.h%linux/compat.h%g"
      
      Cc: acme@kernel.org
      Cc: benh@kernel.crashing.org
      Cc: borntraeger@de.ibm.com
      Cc: catalin.marinas@arm.com
      Cc: cmetcalf@mellanox.com
      Cc: cohuck@redhat.com
      Cc: davem@davemloft.net
      Cc: deller@gmx.de
      Cc: devel@driverdev.osuosl.org
      Cc: gerald.schaefer@de.ibm.com
      Cc: gregkh@linuxfoundation.org
      Cc: heiko.carstens@de.ibm.com
      Cc: hoeppner@linux.vnet.ibm.com
      Cc: hpa@zytor.com
      Cc: jejb@parisc-linux.org
      Cc: jwi@linux.vnet.ibm.com
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-s390@vger.kernel.org
      Cc: mark.rutland@arm.com
      Cc: mingo@redhat.com
      Cc: mpe@ellerman.id.au
      Cc: oberpar@linux.vnet.ibm.com
      Cc: oprofile-list@lists.sf.net
      Cc: paulus@samba.org
      Cc: peterz@infradead.org
      Cc: ralf@linux-mips.org
      Cc: rostedt@goodmis.org
      Cc: rric@kernel.org
      Cc: schwidefsky@de.ibm.com
      Cc: sebott@linux.vnet.ibm.com
      Cc: sparclinux@vger.kernel.org
      Cc: sth@linux.vnet.ibm.com
      Cc: ubraun@linux.vnet.ibm.com
      Cc: will.deacon@arm.com
      Cc: x86@kernel.org
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NJames Hogan <jhogan@kernel.org>
      Acked-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      0d55303c
  15. 27 3月, 2018 1 次提交
  16. 20 3月, 2018 2 次提交
  17. 17 3月, 2018 1 次提交
  18. 16 3月, 2018 1 次提交
  19. 12 3月, 2018 1 次提交
    • P
      perf/core: Remove perf_event::group_entry · 8343aae6
      Peter Zijlstra 提交于
      Now that all the grouping is done with RB trees, we no longer need
      group_entry and can replace the whole thing with sibling_list.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8343aae6
  20. 09 3月, 2018 3 次提交
    • K
      perf/x86/intel: Disable userspace RDPMC usage for large PEBS · 1af22eba
      Kan Liang 提交于
      Userspace RDPMC cannot possibly work for large PEBS, which was introduced in:
      
        b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)")
      
      When the PEBS interrupt threshold is larger than one, there is no way
      to get exact auto-reload times and value for userspace RDPMC.  Disable
      the userspace RDPMC usage when large PEBS is enabled.
      
      The only exception is when the PEBS interrupt threshold is 1, in which
      case user-space RDPMC works well even with auto-reload events.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Fixes: b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)")
      Link: http://lkml.kernel.org/r/1518474035-21006-6-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1af22eba
    • K
      perf/x86: Introduce a ->read() callback in 'struct x86_pmu' · bcfbe5c4
      Kan Liang 提交于
      Auto-reload needs to be specially handled when reading event counts.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Link: http://lkml.kernel.org/r/1518474035-21006-3-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bcfbe5c4
    • K
      perf/x86/intel: Fix event update for auto-reload · d31fc13f
      Kan Liang 提交于
      There is a bug when reading event->count with large PEBS enabled.
      
      Here is an example:
      
        # ./read_count
        0x71f0
        0x122c0
        0x1000000001c54
        0x100000001257d
        0x200000000bdc5
      
      In fixed period mode, the auto-reload mechanism could be enabled for
      PEBS events, but the calculation of event->count does not take the
      auto-reload values into account.
      
      Anyone who reads event->count will get the wrong result, e.g x86_pmu_read().
      
      This bug was introduced with the auto-reload mechanism enabled since
      commit:
      
        851559e3 ("perf/x86/intel: Use the PEBS auto reload mechanism when possible")
      
      Introduce intel_pmu_save_and_restart_reload() to calculate the
      event->count only for auto-reload.
      
      Since the counter increments a negative counter value and overflows on
      the sign switch, giving the interval:
      
              [-period, 0]
      
      the difference between two consequtive reads is:
      
       A) value2 - value1;
          when no overflows have happened in between,
       B) (0 - value1) + (value2 - (-period));
          when one overflow happened in between,
       C) (0 - value1) + (n - 1) * (period) + (value2 - (-period));
          when @n overflows happened in between.
      
      Here A) is the obvious difference, B) is the extension to the discrete
      interval, where the first term is to the top of the interval and the
      second term is from the bottom of the next interval and C) the extension
      to multiple intervals, where the middle term is the whole intervals
      covered.
      
      The equation for all cases is:
      
          value2 - value1 + n * period
      
      Previously the event->count is updated right before the sample output.
      But for case A, there is no PEBS record ready. It needs to be specially
      handled.
      
      Remove the auto-reload code from x86_perf_event_set_period() since
      we'll not longer call that function in this case.
      
      Based-on-code-from: Peter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Fixes: 851559e3 ("perf/x86/intel: Use the PEBS auto reload mechanism when possible")
      Link: http://lkml.kernel.org/r/1518474035-21006-2-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d31fc13f
  21. 17 12月, 2017 1 次提交
  22. 25 10月, 2017 1 次提交
    • M
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland 提交于
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6aa7de05
  23. 24 10月, 2017 1 次提交
  24. 29 8月, 2017 2 次提交
  25. 25 8月, 2017 1 次提交
    • A
      perf/x86: Export some PMU attributes in caps/ directory · b00233b5
      Andi Kleen 提交于
      It can be difficult to figure out for user programs what features
      the x86 CPU PMU driver actually supports. Currently it requires
      grepping in dmesg, but dmesg is not always available.
      
      This adds a caps directory to /sys/bus/event_source/devices/cpu/,
      similar to the caps already used on intel_pt, which can be used to
      discover the available capabilities cleanly.
      
      Three capabilities are defined:
      
       - pmu_name:	Underlying CPU name known to the driver
       - max_precise:	Max precise level supported
       - branches:	Known depth of LBR.
      
      Example:
      
        % grep . /sys/bus/event_source/devices/cpu/caps/*
        /sys/bus/event_source/devices/cpu/caps/branches:32
        /sys/bus/event_source/devices/cpu/caps/max_precise:3
        /sys/bus/event_source/devices/cpu/caps/pmu_name:skylake
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170822185201.9261-3-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b00233b5
  26. 10 8月, 2017 1 次提交
  27. 20 7月, 2017 1 次提交
    • A
      perf/x86: Shut up false-positive -Wmaybe-uninitialized warning · 11d8b058
      Arnd Bergmann 提交于
      The intialization function checks for various failure scenarios, but
      unfortunately the compiler gets a little confused about the possible
      combinations, leading to a false-positive build warning when
      -Wmaybe-uninitialized is set:
      
        arch/x86/events/core.c: In function ‘init_hw_perf_events’:
        arch/x86/events/core.c:264:3: warning: ‘reg_fail’ may be used uninitialized in this function [-Wmaybe-uninitialized]
        arch/x86/events/core.c:264:3: warning: ‘val_fail’ may be used uninitialized in this function [-Wmaybe-uninitialized]
           pr_err(FW_BUG "the BIOS has corrupted hw-PMU resources (MSR %x is %Lx)\n",
      
      We can't actually run into this case, so this shuts up the warning
      by initializing the variables to a known-invalid state.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170719125310.2487451-2-arnd@arndb.de
      Link: https://patchwork.kernel.org/patch/9392595/Signed-off-by: NIngo Molnar <mingo@kernel.org>
      11d8b058
  28. 08 6月, 2017 1 次提交
  29. 05 6月, 2017 1 次提交
    • A
      x86/mm: Rework lazy TLB to track the actual loaded mm · 3d28ebce
      Andy Lutomirski 提交于
      Lazy TLB state is currently managed in a rather baroque manner.
      AFAICT, there are three possible states:
      
       - Non-lazy.  This means that we're running a user thread or a
         kernel thread that has called use_mm().  current->mm ==
         current->active_mm == cpu_tlbstate.active_mm and
         cpu_tlbstate.state == TLBSTATE_OK.
      
       - Lazy with user mm.  We're running a kernel thread without an mm
         and we're borrowing an mm_struct.  We have current->mm == NULL,
         current->active_mm == cpu_tlbstate.active_mm, cpu_tlbstate.state
         != TLBSTATE_OK (i.e. TLBSTATE_LAZY or 0).  The current cpu is set
         in mm_cpumask(current->active_mm).  CR3 points to
         current->active_mm->pgd.  The TLB is up to date.
      
       - Lazy with init_mm.  This happens when we call leave_mm().  We
         have current->mm == NULL, current->active_mm ==
         cpu_tlbstate.active_mm, but that mm is only relelvant insofar as
         the scheduler is tracking it for refcounting.  cpu_tlbstate.state
         != TLBSTATE_OK.  The current cpu is clear in
         mm_cpumask(current->active_mm).  CR3 points to swapper_pg_dir,
         i.e. init_mm->pgd.
      
      This patch simplifies the situation.  Other than perf, x86 stops
      caring about current->active_mm at all.  We have
      cpu_tlbstate.loaded_mm pointing to the mm that CR3 references.  The
      TLB is always up to date for that mm.  leave_mm() just switches us
      to init_mm.  There are no longer any special cases for mm_cpumask,
      and switch_mm() switches mms without worrying about laziness.
      
      After this patch, cpu_tlbstate.state serves only to tell the TLB
      flush code whether it may switch to init_mm instead of doing a
      normal flush.
      
      This makes fairly extensive changes to xen_exit_mmap(), which used
      to look a bit like black magic.
      
      Perf is unchanged.  With or without this change, perf may behave a bit
      erratically if it tries to read user memory in kernel thread context.
      We should build on this patch to teach perf to never look at user
      memory when cpu_tlbstate.loaded_mm != current->mm.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3d28ebce
  30. 26 5月, 2017 1 次提交