1. 01 8月, 2020 1 次提交
  2. 31 7月, 2020 7 次提交
  3. 30 7月, 2020 6 次提交
  4. 29 7月, 2020 3 次提交
  5. 28 7月, 2020 5 次提交
  6. 27 7月, 2020 18 次提交
    • T
      genirq/affinity: Make affinity setting if activated opt-in · f0c7baca
      Thomas Gleixner 提交于
      John reported that on a RK3288 system the perf per CPU interrupts are all
      affine to CPU0 and provided the analysis:
      
       "It looks like what happens is that because the interrupts are not per-CPU
        in the hardware, armpmu_request_irq() calls irq_force_affinity() while
        the interrupt is deactivated and then request_irq() with IRQF_PERCPU |
        IRQF_NOBALANCING.  
      
        Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls
        irq_setup_affinity() which returns early because IRQF_PERCPU and
        IRQF_NOBALANCING are set, leaving the interrupt on its original CPU."
      
      This was broken by the recent commit which blocked interrupt affinity
      setting in hardware before activation of the interrupt. While this works in
      general, it does not work for this particular case. As contrary to the
      initial analysis not all interrupt chip drivers implement an activate
      callback, the safe cure is to make the deferred interrupt affinity setting
      at activation time opt-in.
      
      Implement the necessary core logic and make the two irqchip implementations
      for which this is required opt-in. In hindsight this would have been the
      right thing to do, but ...
      
      Fixes: baedb87d ("genirq/affinity: Handle affinity setting on inactive interrupts correctly")
      Reported-by: NJohn Keeping <john@metanate.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NMarc Zyngier <maz@kernel.org>
      Acked-by: NMarc Zyngier <maz@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de
      f0c7baca
    • P
      locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs · ed004953
      peterz@infradead.org 提交于
      Prior to commit:
      
        859d069e ("lockdep: Prepare for NMI IRQ state tracking")
      
      IRQ state tracking was disabled in NMIs due to nmi_enter()
      doing lockdep_off() -- with the obvious requirement that NMI entry
      call nmi_enter() before trace_hardirqs_off().
      
      [ AFAICT, PowerPC and SH violate this order on their NMI entry ]
      
      However, that commit explicitly changed lockdep_hardirqs_*() to ignore
      lockdep_off() and breaks every architecture that has irq-tracing in
      it's NMI entry that hasn't been fixed up (x86 being the only fixed one
      at this point).
      
      The reason for this change is that by ignoring lockdep_off() we can:
      
        - get rid of 'current->lockdep_recursion' in lockdep_assert_irqs*()
          which was going to to give header-recursion issues with the
          seqlock rework.
      
        - allow these lockdep_assert_*() macros to function in NMI context.
      
      Restore the previous state of things and allow an architecture to
      opt-in to the NMI IRQ tracking support, however instead of relying on
      lockdep_off(), rely on in_nmi(), both are part of nmi_enter() and so
      over-all entry ordering doesn't need to change.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20200727124852.GK119549@hirez.programming.kicks-ass.net
      ed004953
    • P
      KVM: nVMX: check for invalid hdr.vmx.flags · 5e105c88
      Paolo Bonzini 提交于
      hdr.vmx.flags is meant for future extensions to the ABI, rejecting
      invalid flags is necessary to avoid broken half-loads of the
      nVMX state.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5e105c88
    • P
      KVM: nVMX: check for required but missing VMCS12 in KVM_SET_NESTED_STATE · 0f02bd0a
      Paolo Bonzini 提交于
      A missing VMCS12 was not causing -EINVAL (it was just read with
      copy_from_user, so it is not a security issue, but it is still
      wrong).  Test for VMCS12 validity and reject the nested state
      if a VMCS12 is required but not present.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0f02bd0a
    • H
      s390/vmemmap: coding style updates · 9a996c67
      Heiko Carstens 提交于
      Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
      9a996c67
    • D
      s390/vmemmap: avoid memset(PAGE_UNUSED) when adding consecutive sections · 2c114df0
      David Hildenbrand 提交于
      Let's avoid memset(PAGE_UNUSED) when adding consecutive sections,
      whereby the vmemmap of a single section does not span full PMDs.
      
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Message-Id: <20200722094558.9828-10-david@redhat.com>
      Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
      2c114df0
    • D
      s390/vmemmap: remember unused sub-pmd ranges · cd5781d6
      David Hildenbrand 提交于
      With a memmap size of 56 bytes or 72 bytes per page, the memmap for a
      256 MB section won't span full PMDs. As we populate single sections and
      depopulate single sections, the depopulation step would not be able to
      free all vmemmap pmds anymore.
      
      Do it similarly to x86, marking the unused memmap ranges in a special way
      (pad it with 0xFD).
      
      This allows us to add/remove sections, cleaning up all allocated
      vmemmap pages even if the memmap size is not multiple of 16 bytes per page.
      
      A 56 byte memmap can, for example, be created with !CONFIG_MEMCG and
      !CONFIG_SLUB.
      
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Message-Id: <20200722094558.9828-9-david@redhat.com>
      Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
      cd5781d6
    • D
      s390/vmemmap: fallback to PTEs if mapping large PMD fails · f2057b42
      David Hildenbrand 提交于
      Let's fallback to single pages if short on huge pages. No need to stop
      memory hotplug.
      
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Message-Id: <20200722094558.9828-8-david@redhat.com>
      Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
      f2057b42
    • D
      s390/vmem: cleanup empty page tables · b9ff8100
      David Hildenbrand 提交于
      Let's cleanup empty page tables. Consider only page tables that fully
      fall into the idendity mapping and the vmemmap range.
      
      As there are no valid accesses to vmem/vmemmap within non-populated ranges,
      the single tlb flush at the end should be sufficient.
      
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Message-Id: <20200722094558.9828-7-david@redhat.com>
      Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
      b9ff8100
    • D
      s390/vmemmap: take the vmem_mutex when populating/freeing · aa18e0e6
      David Hildenbrand 提交于
      Let's synchronize all accesses to the 1:1 and vmemmap mappings. This will
      be especially relevant when wanting to cleanup empty page tables that could
      be shared by both. Avoid races when removing tables that might be just
      about to get reused.
      
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Message-Id: <20200722094558.9828-6-david@redhat.com>
      Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
      aa18e0e6
    • D
      s390/vmemmap: cleanup when vmemmap_populate() fails · c00f05a9
      David Hildenbrand 提交于
      Cleanup what we partially added in case vmemmap_populate() fails. For
      vmem, this is already handled by vmem_add_mapping().
      
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Message-Id: <20200722094558.9828-5-david@redhat.com>
      Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
      c00f05a9
    • D
      s390/vmemmap: extend modify_pagetable() to handle vmemmap · 9ec8fa8d
      David Hildenbrand 提交于
      Extend our shiny new modify_pagetable() to handle !direct (vmemmap)
      mappings. Convert vmemmap_populate() and implement vmemmap_free().
      
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Message-Id: <20200722094558.9828-4-david@redhat.com>
      Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
      9ec8fa8d
    • D
      s390/vmem: consolidate vmem_add_range() and vmem_remove_range() · 3e0d3e40
      David Hildenbrand 提交于
      We want to have only a single pagetable walker and reuse the same
      functionality for vmemmap handling. Let's start by consolidating
      vmem_add_range() and vmem_remove_range(), converting it into a
      recursive implementation.
      
      A recursive implementation makes it easier to expand individual cases
      without harming readability. In addition, we minimize traversing the
      whole hierarchy over and over again.
      
      One change is that we don't unmap large PMDs/PUDs when not completely
      covered by the request, something that should never happen with direct
      mappings, unless one would be removing in other granularity than added,
      which would be broken already.
      
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Message-Id: <20200722094558.9828-3-david@redhat.com>
      Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
      3e0d3e40
    • D
      s390/vmem: rename vmem_add_mem() to vmem_add_range() · 8398b226
      David Hildenbrand 提交于
      Let's match the name to vmem_remove_range().
      
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Message-Id: <20200722094558.9828-2-david@redhat.com>
      Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
      8398b226
    • I
      s390: enable HAVE_FUNCTION_ERROR_INJECTION · 73d6eb48
      Ilya Leoshkevich 提交于
      This kernel feature is required for enabling BPF_KPROBE_OVERRIDE.
      
      Define override_function_with_return() and regs_set_return_value()
      functions, and fix compile errors in syscall_wrapper.h.
      Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
      73d6eb48
    • N
      s390/pci: clarify comment in s390_mmio_read/write · 4631f3ca
      Niklas Schnelle 提交于
      The existing comment was talking about reading in the write part
      and vice versa. While we are here make it more clear why restricting
      the syscalls to MIO capable devices is okay.
      Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
      Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
      4631f3ca
    • N
      powerpc/64s/hash: Fix hash_preload running with interrupts enabled · 909adfc6
      Nicholas Piggin 提交于
      Commit 2f92447f ("powerpc/book3s64/hash: Use the pte_t address from the
      caller") removed the local_irq_disable from hash_preload, but it was
      required for more than just the page table walk: the hash pte busy bit is
      effectively a lock which may be taken in interrupt context, and the local
      update flag test must not be preempted before it's used.
      
      This solves apparent lockups with perf interrupting __hash_page_64K. If
      get_perf_callchain then also takes a hash fault on the same page while it
      is already locked, it will loop forever taking hash faults, which looks like
      this:
      
        cpu 0x49e: Vector: 100 (System Reset) at [c00000001a4f7d70]
            pc: c000000000072dc8: hash_page_mm+0x8/0x800
            lr: c00000000000c5a4: do_hash_page+0x24/0x38
            sp: c0002ac1cc69ac70
           msr: 8000000000081033
          current = 0xc0002ac1cc602e00
          paca    = 0xc00000001de1f280   irqmask: 0x03   irq_happened: 0x01
            pid   = 20118, comm = pread2_processe
        Linux version 5.8.0-rc6-00345-g1fad14f18bc6
        49e:mon> t
        [c0002ac1cc69ac70] c00000000000c5a4 do_hash_page+0x24/0x38 (unreliable)
        --- Exception: 300 (Data Access) at c00000000008fa60 __copy_tofrom_user_power7+0x20c/0x7ac
        [link register   ] c000000000335d10 copy_from_user_nofault+0xf0/0x150
        [c0002ac1cc69af70] c00032bf9fa3c880 (unreliable)
        [c0002ac1cc69afa0] c000000000109df0 read_user_stack_64+0x70/0xf0
        [c0002ac1cc69afd0] c000000000109fcc perf_callchain_user_64+0x15c/0x410
        [c0002ac1cc69b060] c000000000109c00 perf_callchain_user+0x20/0x40
        [c0002ac1cc69b080] c00000000031c6cc get_perf_callchain+0x25c/0x360
        [c0002ac1cc69b120] c000000000316b50 perf_callchain+0x70/0xa0
        [c0002ac1cc69b140] c000000000316ddc perf_prepare_sample+0x25c/0x790
        [c0002ac1cc69b1a0] c000000000317350 perf_event_output_forward+0x40/0xb0
        [c0002ac1cc69b220] c000000000306138 __perf_event_overflow+0x88/0x1a0
        [c0002ac1cc69b270] c00000000010cf70 record_and_restart+0x230/0x750
        [c0002ac1cc69b620] c00000000010d69c perf_event_interrupt+0x20c/0x510
        [c0002ac1cc69b730] c000000000027d9c performance_monitor_exception+0x4c/0x60
        [c0002ac1cc69b750] c00000000000b2f8 performance_monitor_common_virt+0x1b8/0x1c0
        --- Exception: f00 (Performance Monitor) at c0000000000cb5b0 pSeries_lpar_hpte_insert+0x0/0x160
        [link register   ] c0000000000846f0 __hash_page_64K+0x210/0x540
        [c0002ac1cc69ba50] 0000000000000000 (unreliable)
        [c0002ac1cc69bb00] c000000000073ae0 update_mmu_cache+0x390/0x3a0
        [c0002ac1cc69bb70] c00000000037f024 wp_page_copy+0x364/0xce0
        [c0002ac1cc69bc20] c00000000038272c do_wp_page+0xdc/0xa60
        [c0002ac1cc69bc70] c0000000003857bc handle_mm_fault+0xb9c/0x1b60
        [c0002ac1cc69bd50] c00000000006c434 __do_page_fault+0x314/0xc90
        [c0002ac1cc69be20] c00000000000c5c8 handle_page_fault+0x10/0x2c
        --- Exception: 300 (Data Access) at 00007fff8c861fe8
        SP (7ffff6b19660) is in userspace
      
      Fixes: 2f92447f ("powerpc/book3s64/hash: Use the pte_t address from the caller")
      Reported-by: NAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Reported-by: NAnton Blanchard <anton@ozlabs.org>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200727060947.10060-1-npiggin@gmail.com
      909adfc6
    • C
      x86/ioperm: Initialize pointer bitmap with NULL rather than 0 · 90fc7392
      Colin Ian King 提交于
      The pointer bitmap is being initialized with a plain integer 0,
      fix this by initializing it with a NULL instead.
      
      Cleans up sparse warning:
      arch/x86/xen/enlighten_pv.c:876:27: warning: Using plain integer
      as NULL pointer
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Link: https://lore.kernel.org/r/20200721100217.407975-1-colin.king@canonical.com
      90fc7392