1. 14 1月, 2022 1 次提交
  2. 15 11月, 2021 1 次提交
  3. 16 7月, 2021 4 次提交
  4. 22 11月, 2020 1 次提交
    • X
      irqchip/gic-v3-its: Unconditionally save/restore the ITS state on suspend · 74cde1a5
      Xu Qiang 提交于
      On systems without HW-based collections (i.e. anything except GIC-500),
      we rely on firmware to perform the ITS save/restore. This doesn't
      really work, as although FW can properly save everything, it cannot
      fully restore the state of the command queue (the read-side is reset
      to the head of the queue). This results in the ITS consuming previously
      processed commands, potentially corrupting the state.
      
      Instead, let's always save the ITS state on suspend, disabling it in the
      process, and restore the full state on resume. This saves us from broken
      FW as long as it doesn't enable the ITS by itself (for which we can't do
      anything).
      
      This amounts to simply dropping the ITS_FLAGS_SAVE_SUSPEND_STATE.
      Signed-off-by: NXu Qiang <xuqiang36@huawei.com>
      [maz: added warning on resume, rewrote commit message]
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20201107104226.14282-1-xuqiang36@huawei.com
      74cde1a5
  5. 14 10月, 2020 1 次提交
    • M
      memblock: implement for_each_reserved_mem_region() using __next_mem_region() · 9f3d5eaa
      Mike Rapoport 提交于
      Iteration over memblock.reserved with for_each_reserved_mem_region() used
      __next_reserved_mem_region() that implemented a subset of
      __next_mem_region().
      
      Use __for_each_mem_range() and, essentially, __next_mem_region() with
      appropriate parameters to reduce code duplication.
      
      While on it, rename for_each_reserved_mem_region() to
      for_each_reserved_mem_range() for consistency.
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>	[.clang-format]
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Emil Renner Berthing <kernel@esmil.dk>
      Cc: Hari Bathini <hbathini@linux.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: https://lkml.kernel.org/r/20200818151634.14343-17-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f3d5eaa
  6. 24 9月, 2020 1 次提交
    • J
      irq-chip/gic-v3-its: Fix crash if ITS is in a proximity domain without processor or memory · 95ac5bf4
      Jonathan Cameron 提交于
      Note this crash is present before any of the patches in this series, but
      as explained below it is highly unlikely anyone is shipping a firmware that
      causes it. Tests were done using an overriden SRAT.
      
      On ARM64, the gic-v3 driver directly parses SRAT to locate GIC Interrupt
      Translation Service (ITS) Affinity Structures. This is done much later
      in the boot than the parses of SRAT which identify proximity domains.
      
      As a result, an ITS placed in a proximity domain that is not defined by
      another SRAT structure will result in a NUMA node that is not completely
      configured and a crash.
      
      ITS [mem 0x202100000-0x20211ffff]
      ITS@0x0000000202100000: Using ITS number 0
      Unable to handle kernel paging request at virtual address 0000000000001a08
      ...
      
      Call trace:
        __alloc_pages_nodemask+0xe8/0x338
        alloc_pages_node.constprop.0+0x34/0x40
        its_probe_one+0x2f8/0xb18
        gic_acpi_parse_madt_its+0x108/0x150
        acpi_table_parse_entries_array+0x17c/0x264
        acpi_table_parse_entries+0x48/0x6c
        acpi_table_parse_madt+0x30/0x3c
        its_init+0x1c4/0x644
        gic_init_bases+0x4b8/0x4ec
        gic_acpi_init+0x134/0x264
        acpi_match_madt+0x4c/0x84
        acpi_table_parse_entries_array+0x17c/0x264
        acpi_table_parse_entries+0x48/0x6c
        acpi_table_parse_madt+0x30/0x3c
        __acpi_probe_device_table+0x8c/0xe8
        irqchip_init+0x3c/0x48
        init_IRQ+0xcc/0x100
        start_kernel+0x33c/0x548
      
      ACPI 6.3 allows any set of Affinity Structures in SRAT to define a proximity
      domain.  However, as we do not see this crash, we can conclude that no
      firmware is currently placing an ITS in a node that is separate from
      those containing memory and / or processors.
      
      We could modify the SRAT parsing behavior to identify the existence
      of Proximity Domains unique to the ITS structures, and handle them as
      a special case of a generic initiator (once support for those merges).
      
      This patch avoids the complexity that would be needed to handle this corner
      case, by not allowing the ITS entry parsing code to instantiate new NUMA
      Nodes.  If one is encountered that does not already exist, then NO_NUMA_NODE
      is assigned and a warning printed just as if the value had been greater than
      allowed NUMA Nodes.
      
      "SRAT: Invalid NUMA node -1 in ITS affinity"
      
      Whilst this does not provide the full flexibility allowed by ACPI,
      it does fix the problem.  We can revisit a more sophisticated solution if
      needed by future platforms.
      
      Change is simply to replace acpi_map_pxm_to_node with pxm_to_node reflecting
      the fact a new mapping is not created.
      Signed-off-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
      Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      95ac5bf4
  7. 07 9月, 2020 1 次提交
  8. 24 8月, 2020 1 次提交
  9. 27 7月, 2020 3 次提交
    • T
      genirq/affinity: Make affinity setting if activated opt-in · f0c7baca
      Thomas Gleixner 提交于
      John reported that on a RK3288 system the perf per CPU interrupts are all
      affine to CPU0 and provided the analysis:
      
       "It looks like what happens is that because the interrupts are not per-CPU
        in the hardware, armpmu_request_irq() calls irq_force_affinity() while
        the interrupt is deactivated and then request_irq() with IRQF_PERCPU |
        IRQF_NOBALANCING.  
      
        Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls
        irq_setup_affinity() which returns early because IRQF_PERCPU and
        IRQF_NOBALANCING are set, leaving the interrupt on its original CPU."
      
      This was broken by the recent commit which blocked interrupt affinity
      setting in hardware before activation of the interrupt. While this works in
      general, it does not work for this particular case. As contrary to the
      initial analysis not all interrupt chip drivers implement an activate
      callback, the safe cure is to make the deferred interrupt affinity setting
      at activation time opt-in.
      
      Implement the necessary core logic and make the two irqchip implementations
      for which this is required opt-in. In hindsight this would have been the
      right thing to do, but ...
      
      Fixes: baedb87d ("genirq/affinity: Handle affinity setting on inactive interrupts correctly")
      Reported-by: NJohn Keeping <john@metanate.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NMarc Zyngier <maz@kernel.org>
      Acked-by: NMarc Zyngier <maz@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de
      f0c7baca
    • Z
      irqchip/gic-v4.1: Use GFP_ATOMIC flag in allocate_vpe_l1_table() · d1bd7e0b
      Zenghui Yu 提交于
      Booting the latest kernel with DEBUG_ATOMIC_SLEEP=y on a GICv4.1 enabled
      box, I get the following kernel splat:
      
      [    0.053766] BUG: sleeping function called from invalid context at mm/slab.h:567
      [    0.053767] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/1
      [    0.053769] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.8.0-rc3+ #23
      [    0.053770] Call trace:
      [    0.053774]  dump_backtrace+0x0/0x218
      [    0.053775]  show_stack+0x2c/0x38
      [    0.053777]  dump_stack+0xc4/0x10c
      [    0.053779]  ___might_sleep+0xfc/0x140
      [    0.053780]  __might_sleep+0x58/0x90
      [    0.053782]  slab_pre_alloc_hook+0x7c/0x90
      [    0.053783]  kmem_cache_alloc_trace+0x60/0x2f0
      [    0.053785]  its_cpu_init+0x6f4/0xe40
      [    0.053786]  gic_starting_cpu+0x24/0x38
      [    0.053788]  cpuhp_invoke_callback+0xa0/0x710
      [    0.053789]  notify_cpu_starting+0xcc/0xd8
      [    0.053790]  secondary_start_kernel+0x148/0x200
      
       # ./scripts/faddr2line vmlinux its_cpu_init+0x6f4/0xe40
      its_cpu_init+0x6f4/0xe40:
      allocate_vpe_l1_table at drivers/irqchip/irq-gic-v3-its.c:2818
      (inlined by) its_cpu_init_lpis at drivers/irqchip/irq-gic-v3-its.c:3138
      (inlined by) its_cpu_init at drivers/irqchip/irq-gic-v3-its.c:5166
      
      It turned out that we're allocating memory using GFP_KERNEL (may sleep)
      within the CPU hotplug notifier, which is indeed an atomic context. Bad
      thing may happen if we're playing on a system with more than a single
      CommonLPIAff group. Avoid it by turning this into an atomic allocation.
      
      Fixes: 5e516846 ("irqchip/gic-v4.1: VPE table (aka GICR_VPROPBASER) allocation")
      Signed-off-by: NZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200630133746.816-1-yuzenghui@huawei.com
      d1bd7e0b
    • Z
      irqchip/gic-v4.1: Ensure accessing the correct RD when writing INVALLR · 3af9571c
      Zenghui Yu 提交于
      The GICv4.1 spec tells us that it's CONSTRAINED UNPREDICTABLE to issue a
      register-based invalidation operation for a vPEID not mapped to that RD,
      or another RD within the same CommonLPIAff group.
      
      To follow this rule, commit f3a05921 ("irqchip/gic-v4.1: Ensure mutual
      exclusion between vPE affinity change and RD access") tried to address the
      race between the RD accesses and the vPE affinity change, but somehow
      forgot to take GICR_INVALLR into account. Let's take the vpe_lock before
      evaluating vpe->col_idx to fix it.
      
      Fixes: f3a05921 ("irqchip/gic-v4.1: Ensure mutual exclusion between vPE affinity change and RD access")
      Signed-off-by: NZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20200720092328.708-1-yuzenghui@huawei.com
      3af9571c
  10. 23 6月, 2020 1 次提交
    • M
      KVM: arm64: vgic-v4: Plug race between non-residency and v4.1 doorbell · a3f574cd
      Marc Zyngier 提交于
      When making a vPE non-resident because it has hit a blocking WFI,
      the doorbell can fire at any time after the write to the RD.
      Crucially, it can fire right between the write to GICR_VPENDBASER
      and the write to the pending_last field in the its_vpe structure.
      
      This means that we would overwrite pending_last with stale data,
      and potentially not wakeup until some unrelated event (such as
      a timer interrupt) puts the vPE back on the CPU.
      
      GICv4 isn't affected by this as we actively mask the doorbell on
      entering the guest, while GICv4.1 automatically manages doorbell
      delivery without any hypervisor-driven masking.
      
      Use the vpe_lock to synchronize such update, which solves the
      problem altogether.
      
      Fixes: ae699ad3 ("irqchip/gic-v4.1: Move doorbell management to the GICv4 abstraction layer")
      Reported-by: NZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      a3f574cd
  11. 21 6月, 2020 1 次提交
  12. 20 5月, 2020 1 次提交
  13. 18 5月, 2020 1 次提交
  14. 16 4月, 2020 2 次提交
    • M
      irqchip/gic-v4.1: Update effective affinity of virtual SGIs · 4b2dfe1e
      Marc Zyngier 提交于
      Although the vSGIs are not directly visible to the host, they still
      get moved around by the CPU hotplug, for example. This results in
      the kernel moaning on the console, such as:
      
        genirq: irq_chip GICv4.1-sgi did not update eff. affinity mask of irq 38
      
      Updating the effective affinity on set_affinity() fixes it.
      Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      4b2dfe1e
    • M
      irqchip/gic-v4.1: Add support for VPENDBASER's Dirty+Valid signaling · 96806229
      Marc Zyngier 提交于
      When a vPE is made resident, the GIC starts parsing the virtual pending
      table to deliver pending interrupts. This takes place asynchronously,
      and can at times take a long while. Long enough that the vcpu enters
      the guest and hits WFI before any interrupt has been signaled yet.
      The vcpu then exits, blocks, and now gets a doorbell. Rince, repeat.
      
      In order to avoid the above, a (optional on GICv4, mandatory on v4.1)
      feature allows the GIC to feedback to the hypervisor whether it is
      done parsing the VPT by clearing the GICR_VPENDBASER.Dirty bit.
      The hypervisor can then wait until the GIC is ready before actually
      running the vPE.
      
      Plug the detection code as well as polling on vPE schedule. While
      at it, tidy-up the kernel message that displays the GICv4 optional
      features.
      Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      96806229
  15. 24 3月, 2020 6 次提交
  16. 21 3月, 2020 5 次提交
  17. 19 3月, 2020 2 次提交
  18. 16 3月, 2020 2 次提交
    • M
      irqchip/gic-v4: Provide irq_retrigger to avoid circular locking dependency · 7809f701
      Marc Zyngier 提交于
      On a very heavily loaded D05 with GICv4, I managed to trigger the
      following lockdep splat:
      
      [ 6022.598864] ======================================================
      [ 6022.605031] WARNING: possible circular locking dependency detected
      [ 6022.611200] 5.6.0-rc4-00026-geee7c7b0f498 #680 Tainted: G            E
      [ 6022.618061] ------------------------------------------------------
      [ 6022.624227] qemu-system-aar/7569 is trying to acquire lock:
      [ 6022.629789] ffff042f97606808 (&p->pi_lock){-.-.}, at: try_to_wake_up+0x54/0x7a0
      [ 6022.637102]
      [ 6022.637102] but task is already holding lock:
      [ 6022.642921] ffff002fae424cf0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x5c/0x98
      [ 6022.651350]
      [ 6022.651350] which lock already depends on the new lock.
      [ 6022.651350]
      [ 6022.659512]
      [ 6022.659512] the existing dependency chain (in reverse order) is:
      [ 6022.666980]
      [ 6022.666980] -> #2 (&irq_desc_lock_class){-.-.}:
      [ 6022.672983]        _raw_spin_lock_irqsave+0x50/0x78
      [ 6022.677848]        __irq_get_desc_lock+0x5c/0x98
      [ 6022.682453]        irq_set_vcpu_affinity+0x40/0xc0
      [ 6022.687236]        its_make_vpe_non_resident+0x6c/0xb8
      [ 6022.692364]        vgic_v4_put+0x54/0x70
      [ 6022.696273]        vgic_v3_put+0x20/0xd8
      [ 6022.700183]        kvm_vgic_put+0x30/0x48
      [ 6022.704182]        kvm_arch_vcpu_put+0x34/0x50
      [ 6022.708614]        kvm_sched_out+0x34/0x50
      [ 6022.712700]        __schedule+0x4bc/0x7f8
      [ 6022.716697]        schedule+0x50/0xd8
      [ 6022.720347]        kvm_arch_vcpu_ioctl_run+0x5f0/0x978
      [ 6022.725473]        kvm_vcpu_ioctl+0x3d4/0x8f8
      [ 6022.729820]        ksys_ioctl+0x90/0xd0
      [ 6022.733642]        __arm64_sys_ioctl+0x24/0x30
      [ 6022.738074]        el0_svc_common.constprop.3+0xa8/0x1e8
      [ 6022.743373]        do_el0_svc+0x28/0x88
      [ 6022.747198]        el0_svc+0x14/0x40
      [ 6022.750761]        el0_sync_handler+0x124/0x2b8
      [ 6022.755278]        el0_sync+0x140/0x180
      [ 6022.759100]
      [ 6022.759100] -> #1 (&rq->lock){-.-.}:
      [ 6022.764143]        _raw_spin_lock+0x38/0x50
      [ 6022.768314]        task_fork_fair+0x40/0x128
      [ 6022.772572]        sched_fork+0xe0/0x210
      [ 6022.776484]        copy_process+0x8c4/0x18d8
      [ 6022.780742]        _do_fork+0x88/0x6d8
      [ 6022.784478]        kernel_thread+0x64/0x88
      [ 6022.788563]        rest_init+0x30/0x270
      [ 6022.792390]        arch_call_rest_init+0x14/0x1c
      [ 6022.796995]        start_kernel+0x498/0x4c4
      [ 6022.801164]
      [ 6022.801164] -> #0 (&p->pi_lock){-.-.}:
      [ 6022.806382]        __lock_acquire+0xdd8/0x15c8
      [ 6022.810813]        lock_acquire+0xd0/0x218
      [ 6022.814896]        _raw_spin_lock_irqsave+0x50/0x78
      [ 6022.819761]        try_to_wake_up+0x54/0x7a0
      [ 6022.824018]        wake_up_process+0x1c/0x28
      [ 6022.828276]        wakeup_softirqd+0x38/0x40
      [ 6022.832533]        __tasklet_schedule_common+0xc4/0xf0
      [ 6022.837658]        __tasklet_schedule+0x24/0x30
      [ 6022.842176]        check_irq_resend+0xc8/0x158
      [ 6022.846609]        irq_startup+0x74/0x128
      [ 6022.850606]        __enable_irq+0x6c/0x78
      [ 6022.854602]        enable_irq+0x54/0xa0
      [ 6022.858431]        its_make_vpe_non_resident+0xa4/0xb8
      [ 6022.863557]        vgic_v4_put+0x54/0x70
      [ 6022.867469]        kvm_arch_vcpu_blocking+0x28/0x38
      [ 6022.872336]        kvm_vcpu_block+0x48/0x490
      [ 6022.876594]        kvm_handle_wfx+0x18c/0x310
      [ 6022.880938]        handle_exit+0x138/0x198
      [ 6022.885022]        kvm_arch_vcpu_ioctl_run+0x4d4/0x978
      [ 6022.890148]        kvm_vcpu_ioctl+0x3d4/0x8f8
      [ 6022.894494]        ksys_ioctl+0x90/0xd0
      [ 6022.898317]        __arm64_sys_ioctl+0x24/0x30
      [ 6022.902748]        el0_svc_common.constprop.3+0xa8/0x1e8
      [ 6022.908046]        do_el0_svc+0x28/0x88
      [ 6022.911871]        el0_svc+0x14/0x40
      [ 6022.915434]        el0_sync_handler+0x124/0x2b8
      [ 6022.919951]        el0_sync+0x140/0x180
      [ 6022.923773]
      [ 6022.923773] other info that might help us debug this:
      [ 6022.923773]
      [ 6022.931762] Chain exists of:
      [ 6022.931762]   &p->pi_lock --> &rq->lock --> &irq_desc_lock_class
      [ 6022.931762]
      [ 6022.942101]  Possible unsafe locking scenario:
      [ 6022.942101]
      [ 6022.948007]        CPU0                    CPU1
      [ 6022.952523]        ----                    ----
      [ 6022.957039]   lock(&irq_desc_lock_class);
      [ 6022.961036]                                lock(&rq->lock);
      [ 6022.966595]                                lock(&irq_desc_lock_class);
      [ 6022.973109]   lock(&p->pi_lock);
      [ 6022.976324]
      [ 6022.976324]  *** DEADLOCK ***
      
      This is happening because we have a pending doorbell that requires
      retrigger. As SW retriggering is done in a tasklet, we trigger the
      circular dependency above.
      
      The easy cop-out is to provide a retrigger callback that doesn't
      require acquiring any extra lock.
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200310184921.23552-5-maz@kernel.org
      7809f701
    • M
      irqchip/gic-v3-its: Probe ITS page size for all GITS_BASERn registers · d5df9dc9
      Marc Zyngier 提交于
      The GICv3 ITS driver assumes that once it has latched on a page size for
      a given BASER register, it can use the same page size as the maximum
      page size for all subsequent BASER registers.
      
      Although it worked so far, nothing in the architecture guarantees this,
      and Nianyao Tang hit this problem on some undisclosed implementation.
      
      Let's bite the bullet and probe the the supported page size on all BASER
      registers before starting to populate the tables. This simplifies the
      setup a bit, at the expense of a few additional MMIO accesses.
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Reported-by: NNianyao Tang <tangnianyao@huawei.com>
      Tested-by: NNianyao Tang <tangnianyao@huawei.com>
      Link: https://lore.kernel.org/r/1584089195-63897-1-git-send-email-zhangshaokun@hisilicon.com
      d5df9dc9
  19. 08 3月, 2020 1 次提交
  20. 10 2月, 2020 1 次提交
  21. 08 2月, 2020 3 次提交