1. 05 7月, 2021 1 次提交
    • T
      s390/cpumf: Allow concurrent access for CPU Measurement Counter Facility · a029a4ea
      Thomas Richter 提交于
      Commit cf6acb8b ("s390/cpumf: Add support for complete counter set extraction")
      allows access to the CPU Measurement Counter Facility via character
      device /dev/hwctr. The access was exclusive via this device or
      via perf_event_open() system call. Only one path at a time was
      permitted. The CPU Measurement Counter Facility device driver blocked
      access to other processes.
      
      This patch removes this restriction and allows concurrent access to
      the CPU Measurement Counter Facility from multiple processes at the same
      time via perf_event_open() SVC and via /dev/hwctr device. The access
      via /dev/hwctr device is still exclusive, only one process is allowed to
      access this device.
      
      This patch
      - moves the /dev/hwctr device access from file perf_cpum_cf_diag.c.
        to file perf_cpum_cf.c.
      - use only one trace buffer .../s390dbf/cpum_cf.
      - remove cfset_csd structure and includes its members it into the
        structure cpu_cf_events. This results in one data structure and
        simplifies the access.
      - rework function familiy ctr_set_enable, ctr_set_disable, ctr_set_start
        and ctr_set_stop which operate on a counter set number.
        Now they operate on a counter set bit mask.
      - move CF_DIAG event functionality to file perf_cpum_cf.c. It now
        contains the complete functionality of the CPU Measurement Counter
        Facility:
        - Performance measurement support for counters using perf stat.
        - Support for complete counter set extraction with device /dev/hwctr.
        - Support for counter set extraction event CF_DIAG attached to
          samples using perf record.
      - removes file perf_cpum_cf_diag.c
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: NSumanth Korikkar <sumanthk@linux.ibm.com>
      Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
      a029a4ea
  2. 30 6月, 2021 1 次提交
  3. 26 4月, 2021 1 次提交
    • T
      dmaengine: idxd: Add IDXD performance monitor support · 81dd4d4d
      Tom Zanussi 提交于
      Implement the IDXD performance monitor capability (named 'perfmon' in
      the DSA (Data Streaming Accelerator) spec [1]), which supports the
      collection of information about key events occurring during DSA and
      IAX (Intel Analytics Accelerator) device execution, to assist in
      performance tuning and debugging.
      
      The idxd perfmon support is implemented as part of the IDXD driver and
      interfaces with the Linux perf framework.  It has several features in
      common with the existing uncore pmu support:
      
        - it does not support sampling
        - does not support per-thread counting
      
      However it also has some unique features not present in the core and
      uncore support:
      
        - all general-purpose counters are identical, thus no event constraints
        - operation is always system-wide
      
      While the core perf subsystem assumes that all counters are by default
      per-cpu, the uncore pmus are socket-scoped and use a cpu mask to
      restrict counting to one cpu from each socket.  IDXD counters use a
      similar strategy but expand the scope even further; since IDXD
      counters are system-wide and can be read from any cpu, the IDXD perf
      driver picks a single cpu to do the work (with cpu hotplug notifiers
      to choose a different cpu if the chosen one is taken off-line).
      
      More specifically, the perf userspace tool by default opens a counter
      for each cpu for an event.  However, if it finds a cpumask file
      associated with the pmu under sysfs, as is the case with the uncore
      pmus, it will open counters only on the cpus specified by the cpumask.
      Since perfmon only needs to open a single counter per event for a
      given IDXD device, the perfmon driver will create a sysfs cpumask file
      for the device and insert the first cpu of the system into it.  When a
      user uses perf to open an event, perf will open a single counter on
      the cpu specified by the cpu mask.  This amounts to the default
      system-wide rather than per-cpu counting mentioned previously for
      perfmon pmu events.  In order to keep the cpu mask up-to-date, the
      driver implements cpu hotplug support for multiple devices, as IDXD
      usually enumerates and registers more than one idxd device.
      
      The perfmon driver implements basic perfmon hardware capability
      discovery and configuration, and is initialized by the IDXD driver's
      probe function.  During initialization, the driver retrieves the total
      number of supported performance counters, the pmu ID, and the device
      type from idxd device, and registers itself under the Linux perf
      framework.
      
      The perf userspace tool can be used to monitor single or multiple
      events depending on the given configuration, as well as event groups,
      which are also supported by the perfmon driver.  The user configures
      events using the perf tool command-line interface by specifying the
      event and corresponding event category, along with an optional set of
      filters that can be used to restrict counting to specific work queues,
      traffic classes, page and transfer sizes, and engines (See [1] for
      specifics).
      
      With the configuration specified by the user, the perf tool issues a
      system call passing that information to the kernel, which uses it to
      initialize the specified event(s).  The event(s) are opened and
      started, and following termination of the perf command, they're
      stopped.  At that point, the perfmon driver will read the latest count
      for the event(s), calculate the difference between the latest counter
      values and previously tracked counter values, and display the final
      incremental count as the event count for the cycle.  An overflow
      handler registered on the IDXD irq path is used to account for counter
      overflows, which are signaled by an overflow interrupt.
      
      Below are a couple of examples of perf usage for monitoring DSA events.
      
      The following monitors all events in the 'engine' category.  Becuuse
      no filters are specified, this captures all engine events for the
      workload, which in this case is 19 iterations of the work generated by
      the kernel dmatest module.
      
      Details describing the events can be found in Appendix D of [1],
      Performance Monitoring Events, but briefly they are:
      
        event 0x1:  total input data processed, in 32-byte units
        event 0x2:  total data written, in 32-byte units
        event 0x4:  number of work descriptors that read the source
        event 0x8:  number of work descriptors that write the destination
        event 0x10: number of work descriptors dispatched from batch descriptors
        event 0x20: number of work descriptors dispatched from work queues
      
       # perf stat -e dsa0/event=0x1,event_category=0x1/,
                      dsa0/event=0x2,event_category=0x1/,
      		dsa0/event=0x4,event_category=0x1/,
      		dsa0/event=0x8,event_category=0x1/,
      		dsa0/event=0x10,event_category=0x1/,
      		dsa0/event=0x20,event_category=0x1/
      		  modprobe dmatest channel=dma0chan0 timeout=2000
      		  iterations=19 run=1 wait=1
      
           Performance counter stats for 'system wide':
      
                       5,332      dsa0/event=0x1,event_category=0x1/
                       5,327      dsa0/event=0x2,event_category=0x1/
                          19      dsa0/event=0x4,event_category=0x1/
                          19      dsa0/event=0x8,event_category=0x1/
                           0      dsa0/event=0x10,event_category=0x1/
                          19      dsa0/event=0x20,event_category=0x1/
      
                21.977436186 seconds time elapsed
      
      The command below illustrates filter usage with a simple example.  It
      specifies that MEM_MOVE operations should be counted for the DSA
      device dsa0 (event 0x8 corresponds to the EV_MEM_MOVE event - Number
      of Memory Move Descriptors, which is part of event category 0x3 -
      Operations. The detailed category and event IDs are available in
      Appendix D, Performance Monitoring Events, of [1]).  In addition to
      the event and event category, a number of filters are also specified
      (the detailed filter values are available in Chapter 6.4 (Filter
      Support) of [1]), which will restrict counting to only those events
      that meet all of the filter criteria.  In this case, the filters
      specify that only MEM_MOVE operations that are serviced by work queue
      wq0 and specifically engine number engine0 and traffic class tc0
      having sizes between 0 and 4k and page size of between 0 and 1G result
      in a counter hit; anything else will be filtered out and not appear in
      the final count.  Note that filters are optional - any filter not
      specified is assumed to be all ones and will pass anything.
      
       # perf stat -e dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7,
                      filter_eng=0x1,event=0x8,event_category=0x3/
      		  modprobe dmatest channel=dma0chan0 timeout=2000
      		  iterations=19 run=1 wait=1
      
           Performance counter stats for 'system wide':
      
             19      dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7,
                     filter_eng=0x1,event=0x8,event_category=0x3/
      
                21.865914091 seconds time elapsed
      
      The output above reflects that the unspecified workload resulted in
      the counting of 19 MEM_MOVE operation events that met the filter
      criteria.
      
      [1]: https://software.intel.com/content/www/us/en/develop/download/intel-data-streaming-accelerator-preliminary-architecture-specification.html
      
      [ Based on work originally by Jing Lin. ]
      Reviewed-by: NDave Jiang <dave.jiang@intel.com>
      Reviewed-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Link: https://lore.kernel.org/r/0c5080a7d541904c4ad42b848c76a1ce056ddac7.1619276133.git.zanussi@kernel.orgSigned-off-by: NVinod Koul <vkoul@kernel.org>
      81dd4d4d
  4. 08 4月, 2021 2 次提交
  5. 07 4月, 2021 2 次提交
  6. 25 3月, 2021 2 次提交
  7. 24 2月, 2021 1 次提交
    • T
      s390/cpumf: Add support for complete counter set extraction · cf6acb8b
      Thomas Richter 提交于
      Add support to the CPU Measurement counter facility device driver
      to extract complete counter sets per CPU and per counter set from user
      space. This includes a new device named /dev/hwctr and support
      for the device driver functions open, close and ioctl. Other
      functions are not supported.
      
      The ioctl command supports 3 subcommands:
      S390_HWCTR_START: enables counter sets on a list of CPUs.
      S390_HWCTR_STOP: disables counter sets on a list of CPUs.
      S390_HWCTR_READ: reads counter sets on a list of CPUs.
      
      The ioctl(..., S390_HWCTR_READ, ...) is the only subcommand which
      returns data.  It requires member data_bytes to be positive and
      indicates the maximum amount of data available to store counter set
      data. The other ioctl() subcommands do not use this member and it
      should be set to zero.
      The S390_HWCTR_READ subcommand returns the following data:
      
      The cpuset data is flattened using the following scheme, stored in member
      data:
      
       0x0       0x8   0xc       0x10  0x10      0x18  0x20  0x28         0xU-1
       +---------+-----+---------+-----+---------+-----+-----+------+------+
       | no_cpus | cpu | no_sets | set | no_cnts | cv1 | cv2 | .... | cv_n |
       +---------+-----+---------+-----+---------+-----+-----+------+------+
      
                                 0xU   0xU+4     0xU+8 0xU+10             0xV-1
                                 +-----+---------+-----+-----+------+------+
                                 | set | no_cnts | cv1 | cv2 | .... | cv_n |
                                 +-----+---------+-----+-----+------+------+
      
                 0xV   0xV+4     0xV+8 0xV+c
                 +-----+---------+-----+---------+-----+-----+------+------+
                 | cpu | no_sets | set | no_cnts | cv1 | cv2 | .... | cv_n |
                 +-----+---------+-----+---------+-----+-----+------+------+
      
      U and V denote arbitrary hexadezimal addresses.
      The first integer represents the number of CPUs data was extracted
      from. This is followed by CPU number and number of counter sets extracted.
      Both are two integer values. This is followed by the set identifer
      and number of counters extracted. Both are two integer values. This is
      followed by the counter values, each element is eight bytes in size.
      
      The S390_HWCTR_READ ioctl subcommand is also limited to one call per
      minute. This ensures that an application does not read out the
      counter sets too often and reduces the overall CPU performance.
      The complete counter set extraction is an expensive operation.
      Reviewed-by: NSumanth Korikkar <sumanthk@linux.ibm.com>
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
      cf6acb8b
  8. 12 1月, 2021 1 次提交
    • G
      csky: Fixup perf probe failed · 398cb924
      Guo Ren 提交于
      Current perf init will failed with:
      [    1.452433] csky-pmu: probe of soc:pmu failed with error -16
      
      This patch fix it up with adding CPUHP_AP_PERF_CSKY_ONLINE in
      cpuhotplug.h.
      Signed-off-by: NGuo Ren <guoren@linux.alibaba.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      398cb924
  9. 23 12月, 2020 1 次提交
    • D
      powercap/drivers/dtpm: Add CPU energy model based support · 0e8f68d7
      Daniel Lezcano 提交于
      With the powercap dtpm controller, we are able to plug devices with
      power limitation features in the tree.
      
      The following patch introduces the CPU power limitation based on the
      energy model and the performance states.
      
      The power limitation is done at the performance domain level. If some
      CPUs are unplugged, the corresponding power will be subtracted from
      the performance domain total power.
      
      It is up to the platform to initialize the dtpm tree and add the CPU.
      
      Here is an example to create a simple tree with one root node called
      "pkg" and the CPU's performance domains.
      
      static int dtpm_register_pkg(struct dtpm_descr *descr)
      {
      	struct dtpm *pkg;
      	int ret;
      
      	pkg = dtpm_alloc(NULL);
      	if (!pkg)
      		return -ENOMEM;
      
      	ret = dtpm_register(descr->name, pkg, descr->parent);
      	if (ret)
      		return ret;
      
      	return dtpm_register_cpu(pkg);
      }
      
      static struct dtpm_descr descr = {
      	.name = "pkg",
      	.init = dtpm_register_pkg,
      };
      DTPM_DECLARE(descr);
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Reviewed-by: NLukasz Luba <lukasz.luba@arm.com>
      Tested-by: NLukasz Luba <lukasz.luba@arm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      0e8f68d7
  10. 11 11月, 2020 1 次提交
  11. 07 10月, 2020 1 次提交
  12. 01 10月, 2020 1 次提交
  13. 18 9月, 2020 1 次提交
    • A
      arm64: paravirt: Initialize steal time when cpu is online · 75df529b
      Andrew Jones 提交于
      Steal time initialization requires mapping a memory region which
      invokes a memory allocation. Doing this at CPU starting time results
      in the following trace when CONFIG_DEBUG_ATOMIC_SLEEP is enabled:
      
      BUG: sleeping function called from invalid context at mm/slab.h:498
      in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/1
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.9.0-rc5+ #1
      Call trace:
       dump_backtrace+0x0/0x208
       show_stack+0x1c/0x28
       dump_stack+0xc4/0x11c
       ___might_sleep+0xf8/0x130
       __might_sleep+0x58/0x90
       slab_pre_alloc_hook.constprop.101+0xd0/0x118
       kmem_cache_alloc_node_trace+0x84/0x270
       __get_vm_area_node+0x88/0x210
       get_vm_area_caller+0x38/0x40
       __ioremap_caller+0x70/0xf8
       ioremap_cache+0x78/0xb0
       memremap+0x9c/0x1a8
       init_stolen_time_cpu+0x54/0xf0
       cpuhp_invoke_callback+0xa8/0x720
       notify_cpu_starting+0xc8/0xd8
       secondary_start_kernel+0x114/0x180
      CPU1: Booted secondary processor 0x0000000001 [0x431f0a11]
      
      However we don't need to initialize steal time at CPU starting time.
      We can simply wait until CPU online time, just sacrificing a bit of
      accuracy by returning zero for steal time until we know better.
      
      While at it, add __init to the functions that are only called by
      pv_time_init() which is __init.
      Signed-off-by: NAndrew Jones <drjones@redhat.com>
      Fixes: e0685fa2 ("arm64: Retrieve stolen time as paravirtualized guest")
      Cc: stable@vger.kernel.org
      Reviewed-by: NSteven Price <steven.price@arm.com>
      Link: https://lore.kernel.org/r/20200916154530.40809-1-drjones@redhat.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      75df529b
  14. 21 8月, 2020 1 次提交
  15. 16 7月, 2020 1 次提交
  16. 10 6月, 2020 1 次提交
    • A
      irqchip: RISC-V per-HART local interrupt controller driver · 6b7ce892
      Anup Patel 提交于
      The RISC-V per-HART local interrupt controller manages software
      interrupts, timer interrupts, external interrupts (which are routed
      via the platform level interrupt controller) and other per-HART
      local interrupts.
      
      We add a driver for the RISC-V local interrupt controller, which
      eventually replaces the RISC-V architecture code, allowing for a
      better split between arch code and drivers.
      
      The driver is compliant with RISC-V Hart-Level Interrupt Controller
      DT bindings located at:
      Documentation/devicetree/bindings/interrupt-controller/riscv,cpu-intc.txt
      Co-developed-by: NPalmer Dabbelt <palmer@dabbelt.com>
      Signed-off-by: NAnup Patel <anup.patel@wdc.com>
      [Palmer: Cleaned up warnings]
      Signed-off-by: NPalmer Dabbelt <palmer@dabbelt.com>
      6b7ce892
  17. 30 5月, 2020 1 次提交
    • M
      blk-mq: drain I/O when all CPUs in a hctx are offline · bf0beec0
      Ming Lei 提交于
      Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup
      up queue mapping. Thomas mentioned the following point[1]:
      
      "That was the constraint of managed interrupts from the very beginning:
      
       The driver/subsystem has to quiesce the interrupt line and the associated
       queue _before_ it gets shutdown in CPU unplug and not fiddle with it
       until it's restarted by the core when the CPU is plugged in again."
      
      However, current blk-mq implementation doesn't quiesce hw queue before
      the last CPU in the hctx is shutdown.  Even worse, CPUHP_BLK_MQ_DEAD is a
      cpuhp state handled after the CPU is down, so there isn't any chance to
      quiesce the hctx before shutting down the CPU.
      
      Add new CPUHP_AP_BLK_MQ_ONLINE state to stop allocating from blk-mq hctxs
      where the last CPU goes away, and wait for completion of in-flight
      requests.  This guarantees that there is no inflight I/O before shutting
      down the managed IRQ.
      
      Add a BLK_MQ_F_STACKING and set it for dm-rq and loop, so we don't need
      to wait for completion of in-flight requests from these drivers to avoid
      a potential dead-lock. It is safe to do this for stacking drivers as those
      do not use interrupts at all and their I/O completions are triggered by
      underlying devices I/O completion.
      
      [1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
      
      [hch: different retry mechanism, merged two patches, minor cleanups]
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NDaniel Wagner <dwagner@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bf0beec0
  18. 19 5月, 2020 1 次提交
  19. 16 3月, 2020 1 次提交
  20. 02 1月, 2020 1 次提交
  21. 11 12月, 2019 1 次提交
    • D
      padata: validate cpumask without removed CPU during offline · 894c9ef9
      Daniel Jordan 提交于
      Configuring an instance's parallel mask without any online CPUs...
      
        echo 2 > /sys/kernel/pcrypt/pencrypt/parallel_cpumask
        echo 0 > /sys/devices/system/cpu/cpu1/online
      
      ...makes tcrypt mode=215 crash like this:
      
        divide error: 0000 [#1] SMP PTI
        CPU: 4 PID: 283 Comm: modprobe Not tainted 5.4.0-rc8-padata-doc-v2+ #2
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191013_105130-anatol 04/01/2014
        RIP: 0010:padata_do_parallel+0x114/0x300
        Call Trace:
         pcrypt_aead_encrypt+0xc0/0xd0 [pcrypt]
         crypto_aead_encrypt+0x1f/0x30
         do_mult_aead_op+0x4e/0xdf [tcrypt]
         test_mb_aead_speed.constprop.0.cold+0x226/0x564 [tcrypt]
         do_test+0x28c2/0x4d49 [tcrypt]
         tcrypt_mod_init+0x55/0x1000 [tcrypt]
         ...
      
      cpumask_weight() in padata_cpu_hash() returns 0 because the mask has no
      CPUs.  The problem is __padata_remove_cpu() checks for valid masks too
      early and so doesn't mark the instance PADATA_INVALID as expected, which
      would have made padata_do_parallel() return error before doing the
      division.
      
      Fix by introducing a second padata CPU hotplug state before
      CPUHP_BRINGUP_CPU so that __padata_remove_cpu() sees the online mask
      without @cpu.  No need for the second argument to padata_replace() since
      @cpu is now already missing from the online mask.
      
      Fixes: 33e54450 ("padata: Handle empty padata cpumasks")
      Signed-off-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-crypto@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      894c9ef9
  22. 15 11月, 2019 1 次提交
    • M
      x86/hyperv: Initialize clockevents earlier in CPU onlining · 4df4cb9e
      Michael Kelley 提交于
      Hyper-V has historically initialized stimer-based clockevents late in the
      process of onlining a CPU because clockevents depend on stimer
      interrupts. In the original Hyper-V design, stimer interrupts generate a
      VMbus message, so the VMbus machinery must be running first, and VMbus
      can't be initialized until relatively late. On x86/64, LAPIC timer based
      clockevents are used during early initialization before VMbus and
      stimer-based clockevents are ready, and again during CPU offlining after
      the stimer clockevents have been shut down.
      
      Unfortunately, this design creates problems when offlining CPUs for
      hibernation or other purposes. stimer-based clockevents are shut down
      relatively early in the offlining process, so clockevents_unbind_device()
      must be used to fallback to the LAPIC-based clockevents for the remainder
      of the offlining process.  Furthermore, the late initialization and early
      shutdown of stimer-based clockevents doesn't work well on ARM64 since there
      is no other timer like the LAPIC to fallback to. So CPU onlining and
      offlining doesn't work properly.
      
      Fix this by recognizing that stimer Direct Mode is the normal path for
      newer versions of Hyper-V on x86/64, and the only path on other
      architectures. With stimer Direct Mode, stimer interrupts don't require any
      VMbus machinery. stimer clockevents can be initialized and shut down
      consistent with how it is done for other clockevent devices. While the old
      VMbus-based stimer interrupts must still be supported for backward
      compatibility on x86, that mode of operation can be treated as legacy.
      
      So add a new Hyper-V stimer entry in the CPU hotplug state list, and use
      that new state when in Direct Mode. Update the Hyper-V clocksource driver
      to allocate and initialize stimer clockevents earlier during boot. Update
      Hyper-V initialization and the VMbus driver to use this new design. As a
      result, the LAPIC timer is no longer used during boot or CPU
      onlining/offlining and clockevents_unbind_device() is not called.  But
      retain the old design as a legacy implementation for older versions of
      Hyper-V that don't support Direct Mode.
      Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NDexuan Cui <decui@microsoft.com>
      Reviewed-by: NDexuan Cui <decui@microsoft.com>
      Link: https://lkml.kernel.org/r/1573607467-9456-1-git-send-email-mikelley@microsoft.com
      4df4cb9e
  23. 22 10月, 2019 1 次提交
    • S
      arm64: Retrieve stolen time as paravirtualized guest · e0685fa2
      Steven Price 提交于
      Enable paravirtualization features when running under a hypervisor
      supporting the PV_TIME_ST hypercall.
      
      For each (v)CPU, we ask the hypervisor for the location of a shared
      page which the hypervisor will use to report stolen time to us. We set
      pv_time_ops to the stolen time function which simply reads the stolen
      value from the shared page for a VCPU. We guarantee single-copy
      atomicity using READ_ONCE which means we can also read the stolen
      time for another VCPU than the currently running one while it is
      potentially being updated by the hypervisor.
      Signed-off-by: NSteven Price <steven.price@arm.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      e0685fa2
  24. 04 7月, 2019 1 次提交
    • J
      drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDT · 83b44fe3
      James Morse 提交于
      The cacheinfo structures are alloced/freed by cpu online/offline
      callbacks. Originally these were only used by sysfs to expose the
      cache topology to user space. Without any in-kernel dependencies
      CPUHP_AP_ONLINE_DYN was an appropriate choice.
      
      resctrl has started using these structures to identify CPUs that
      share a cache. It updates its 'domain' structures from cpu
      online/offline callbacks. These depend on the cacheinfo structures
      (resctrl_online_cpu()->domain_add_cpu()->get_cache_id()->
       get_cpu_cacheinfo()).
      These also run as CPUHP_AP_ONLINE_DYN.
      
      Now that there is an in-kernel dependency, move the cacheinfo
      work earlier so we know its done before resctrl's CPUHP_AP_ONLINE_DYN
      work runs.
      
      Fixes: 2264d9c7 ("x86/intel_rdt: Build structures for each resource based on cache topology")
      Cc: <stable@vger.kernel.org>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Reinette Chatre <reinette.chatre@intel.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Link: https://lore.kernel.org/r/20190624173656.202407-1-james.morse@arm.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      83b44fe3
  25. 26 6月, 2019 1 次提交
    • M
      clocksource/drivers/exynos_mct: Increase priority over ARM arch timer · 6282edb7
      Marek Szyprowski 提交于
      Exynos SoCs based on CA7/CA15 have 2 timer interfaces: custom Exynos MCT
      (Multi Core Timer) and standard ARM Architected Timers.
      
      There are use cases, where both timer interfaces are used simultanously.
      One of such examples is using Exynos MCT for the main system timer and
      ARM Architected Timers for the KVM and virtualized guests (KVM requires
      arch timers).
      
      Exynos Multi-Core Timer driver (exynos_mct) must be however started
      before ARM Architected Timers (arch_timer), because they both share some
      common hardware blocks (global system counter) and turning on MCT is
      needed to get ARM Architected Timer working properly.
      
      To ensure selecting Exynos MCT as the main system timer, increase MCT
      timer rating. To ensure proper starting order of both timers during
      suspend/resume cycle, increase MCT hotplug priority over ARM Archictected
      Timers.
      Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Reviewed-by: NKrzysztof Kozlowski <krzk@kernel.org>
      Reviewed-by: NChanwoo Choi <cw00.choi@samsung.com>
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      6282edb7
  26. 15 6月, 2019 1 次提交
    • B
      x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callback · 78f4e932
      Borislav Petkov 提交于
      Adric Blake reported the following warning during suspend-resume:
      
        Enabling non-boot CPUs ...
        x86: Booting SMP configuration:
        smpboot: Booting Node 0 Processor 1 APIC 0x2
        unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0000000000000000) \
         at rIP: 0xffffffff8d267924 (native_write_msr+0x4/0x20)
        Call Trace:
         intel_set_tfa
         intel_pmu_cpu_starting
         ? x86_pmu_dead_cpu
         x86_pmu_starting_cpu
         cpuhp_invoke_callback
         ? _raw_spin_lock_irqsave
         notify_cpu_starting
         start_secondary
         secondary_startup_64
        microcode: sig=0x806ea, pf=0x80, revision=0x96
        microcode: updated to revision 0xb4, date = 2019-04-01
        CPU1 is up
      
      The MSR in question is MSR_TFA_RTM_FORCE_ABORT and that MSR is emulated
      by microcode. The log above shows that the microcode loader callback
      happens after the PMU restoration, leading to the conjecture that
      because the microcode hasn't been updated yet, that MSR is not present
      yet, leading to the #GP.
      
      Add a microcode loader-specific hotplug vector which comes before
      the PERF vectors and thus executes earlier and makes sure the MSR is
      present.
      
      Fixes: 400816f6 ("perf/x86/intel: Implement support for TSX Force Abort")
      Reported-by: NAdric Blake <promarbler14@gmail.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: x86@kernel.org
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=203637
      78f4e932
  27. 03 5月, 2019 1 次提交
  28. 08 4月, 2019 1 次提交
    • R
      PM / arch: x86: Rework the MSR_IA32_ENERGY_PERF_BIAS handling · 5861381d
      Rafael J. Wysocki 提交于
      The current handling of MSR_IA32_ENERGY_PERF_BIAS in the kernel is
      problematic, because it may cause changes made by user space to that
      MSR (with the help of the x86_energy_perf_policy tool, for example)
      to be lost every time a CPU goes offline and then back online as well
      as during system-wide power management transitions into sleep states
      and back into the working state.
      
      The first problem is that if the current EPB value for a CPU going
      online is 0 ('performance'), the kernel will change it to 6 ('normal')
      regardless of whether or not this is the first bring-up of that CPU.
      That also happens during system-wide resume from sleep states
      (including, but not limited to, hibernation).  However, the EPB may
      have been adjusted by user space this way and the kernel should not
      blindly override that setting.
      
      The second problem is that if the platform firmware resets the EPB
      values for any CPUs during system-wide resume from a sleep state,
      the kernel will not restore their previous EPB values that may
      have been set by user space before the preceding system-wide
      suspend transition.  Again, that behavior may at least be confusing
      from the user space perspective.
      
      In order to address these issues, rework the handling of
      MSR_IA32_ENERGY_PERF_BIAS so that the EPB value is saved on CPU
      offline and restored on CPU online as well as (for the boot CPU)
      during the syscore stages of system-wide suspend and resume
      transitions, respectively.
      
      However, retain the policy by which the EPB is set to 6 ('normal')
      on the first bring-up of each CPU if its initial value is 0, based
      on the observation that 0 may mean 'not initialized' just as well as
      'performance' in that case.
      
      While at it, move the MSR_IA32_ENERGY_PERF_BIAS handling code into
      a separate file and document it in Documentation/admin-guide.
      
      Fixes: abe48b10 (x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS)
      Fixes: b51ef52d (x86/cpu: Restore MSR_IA32_ENERGY_PERF_BIAS after resume)
      Reported-by: NThomas Renninger <trenn@suse.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      5861381d
  29. 23 2月, 2019 1 次提交
  30. 06 12月, 2018 1 次提交
  31. 22 11月, 2018 1 次提交
    • H
      drivers/perf: xgene: Add CPU hotplug support · cbb72a3c
      Hoan Tran 提交于
      If the CPU assigned to the xgene PMU is taken offline, then subsequent
      perf invocations on the PMU will fail:
      
        # echo 0 > /sys/devices/system/cpu/cpu0/online
        # perf stat -a -e l3c0/cycle-count/,l3c0/write/ sleep 1
          Error:
          The sys_perf_event_open() syscall returned with 19 (No such device) for event (l3c0/cycle-count/).
          /bin/dmesg may provide additional information.
          No CONFIG_PERF_EVENTS=y kernel support configured?
      
      This patch implements a hotplug notifier in the xgene PMU driver so that
      the PMU context is migrated to another online CPU should its assigned
      CPU disappear.
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NHoan Tran <hoan.tran@amperecomputing.com>
      [will: Made naming of new cpuhp_state enum entry consistent]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      cbb72a3c
  32. 03 11月, 2018 1 次提交
  33. 13 8月, 2018 1 次提交
  34. 03 7月, 2018 1 次提交
  35. 27 6月, 2018 1 次提交
  36. 16 3月, 2018 1 次提交
    • A
      arch: remove blackfin port · 4ba66a97
      Arnd Bergmann 提交于
      The Analog Devices Blackfin port was added in 2007 and was rather
      active for a while, but all work on it has come to a standstill
      over time, as Analog have changed their product line-up.
      
      Aaron Wu confirmed that the architecture port is no longer relevant,
      and multiple people suggested removing blackfin independently because
      of some of its oddities like a non-working SMP port, and the amount of
      duplication between the chip variants, which cause extra work when
      doing cross-architecture changes.
      
      Link: https://docs.blackfin.uclinux.org/Acked-by: NAaron Wu <Aaron.Wu@analog.com>
      Acked-by: NBryan Wu <cooloney@gmail.com>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Mike Frysinger <vapier@chromium.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      4ba66a97
  37. 23 2月, 2018 1 次提交