1. 23 8月, 2019 2 次提交
    • S
      KVM: PPC: Book3S HV: Define usage types for rmap array in guest memslot · d22deab6
      Suraj Jitindar Singh 提交于
      The rmap array in the guest memslot is an array of size number of guest
      pages, allocated at memslot creation time. Each rmap entry in this array
      is used to store information about the guest page to which it
      corresponds. For example for a hpt guest it is used to store a lock bit,
      rc bits, a present bit and the index of a hpt entry in the guest hpt
      which maps this page. For a radix guest which is running nested guests
      it is used to store a pointer to a linked list of nested rmap entries
      which store the nested guest physical address which maps this guest
      address and for which there is a pte in the shadow page table.
      
      As there are currently two uses for the rmap array, and the potential
      for this to expand to more in the future, define a type field (being the
      top 8 bits of the rmap entry) to be used to define the type of the rmap
      entry which is currently present and define two values for this field
      for the two current uses of the rmap array.
      
      Since the nested case uses the rmap entry to store a pointer, define
      this type as having the two high bits set as is expected for a pointer.
      Define the hpt entry type as having bit 56 set (bit 7 IBM bit ordering).
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      d22deab6
    • P
      KVM: PPC: Book3S: Mark expected switch fall-through · ff7240cc
      Paul Menzel 提交于
      Fix the error below triggered by `-Wimplicit-fallthrough`, by tagging
      it as an expected fall-through.
      
          arch/powerpc/kvm/book3s_32_mmu.c: In function ‘kvmppc_mmu_book3s_32_xlate_pte’:
          arch/powerpc/kvm/book3s_32_mmu.c:241:21: error: this statement may fall through [-Werror=implicit-fallthrough=]
                pte->may_write = true;
                ~~~~~~~~~~~~~~~^~~~~~
          arch/powerpc/kvm/book3s_32_mmu.c:242:5: note: here
               case 3:
               ^~~~
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ff7240cc
  2. 16 8月, 2019 4 次提交
    • P
      powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown race · da15c03b
      Paul Mackerras 提交于
      Testing has revealed the existence of a race condition where a XIVE
      interrupt being shut down can be in one of the XIVE interrupt queues
      (of which there are up to 8 per CPU, one for each priority) at the
      point where free_irq() is called.  If this happens, can return an
      interrupt number which has been shut down.  This can lead to various
      symptoms:
      
      - irq_to_desc(irq) can be NULL.  In this case, no end-of-interrupt
        function gets called, resulting in the CPU's elevated interrupt
        priority (numerically lowered CPPR) never gets reset.  That then
        means that the CPU stops processing interrupts, causing device
        timeouts and other errors in various device drivers.
      
      - The irq descriptor or related data structures can be in the process
        of being freed as the interrupt code is using them.  This typically
        leads to crashes due to bad pointer dereferences.
      
      This race is basically what commit 62e04686 ("genirq: Add optional
      hardware synchronization for shutdown", 2019-06-28) is intended to
      fix, given a get_irqchip_state() method for the interrupt controller
      being used.  It works by polling the interrupt controller when an
      interrupt is being freed until the controller says it is not pending.
      
      With XIVE, the PQ bits of the interrupt source indicate the state of
      the interrupt source, and in particular the P bit goes from 0 to 1 at
      the point where the hardware writes an entry into the interrupt queue
      that this interrupt is directed towards.  Normally, the code will then
      process the interrupt and do an end-of-interrupt (EOI) operation which
      will reset PQ to 00 (assuming another interrupt hasn't been generated
      in the meantime).  However, there are situations where the code resets
      P even though a queue entry exists (for example, by setting PQ to 01,
      which disables the interrupt source), and also situations where the
      code leaves P at 1 after removing the queue entry (for example, this
      is done for escalation interrupts so they cannot fire again until
      they are explicitly re-enabled).
      
      The code already has a 'saved_p' flag for the interrupt source which
      indicates that a queue entry exists, although it isn't maintained
      consistently.  This patch adds a 'stale_p' flag to indicate that
      P has been left at 1 after processing a queue entry, and adds code
      to set and clear saved_p and stale_p as necessary to maintain a
      consistent indication of whether a queue entry may or may not exist.
      
      With this, we can implement xive_get_irqchip_state() by looking at
      stale_p, saved_p and the ESB PQ bits for the interrupt.
      
      There is some additional code to handle escalation interrupts
      properly; because they are enabled and disabled in KVM assembly code,
      which does not have access to the xive_irq_data struct for the
      escalation interrupt.  Hence, stale_p may be incorrect when the
      escalation interrupt is freed in kvmppc_xive_{,native_}cleanup_vcpu().
      Fortunately, we can fix it up by looking at vcpu->arch.xive_esc_on,
      with some careful attention to barriers in order to ensure the correct
      result if xive_esc_irq() races with kvmppc_xive_cleanup_vcpu().
      
      Finally, this adds code to make noise on the console (pr_crit and
      WARN_ON(1)) if we find an interrupt queue entry for an interrupt
      which does not have a descriptor.  While this won't catch the race
      reliably, if it does get triggered it will be an indication that
      the race is occurring and needs to be debugged.
      
      Fixes: 243e2511 ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190813100648.GE9567@blackberry
      da15c03b
    • P
      KVM: PPC: Book3S HV: Don't push XIVE context when not using XIVE device · 8d4ba9c9
      Paul Mackerras 提交于
      At present, when running a guest on POWER9 using HV KVM but not using
      an in-kernel interrupt controller (XICS or XIVE), for example if QEMU
      is run with the kernel_irqchip=off option, the guest entry code goes
      ahead and tries to load the guest context into the XIVE hardware, even
      though no context has been set up.
      
      To fix this, we check that the "CAM word" is non-zero before pushing
      it to the hardware.  The CAM word is initialized to a non-zero value
      in kvmppc_xive_connect_vcpu() and kvmppc_xive_native_connect_vcpu(),
      and is now cleared in kvmppc_xive_{,native_}cleanup_vcpu.
      
      Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
      Cc: stable@vger.kernel.org # v4.12+
      Reported-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190813100100.GC9567@blackberry
      8d4ba9c9
    • P
      KVM: PPC: Book3S HV: Fix race in re-enabling XIVE escalation interrupts · 959c5d51
      Paul Mackerras 提交于
      Escalation interrupts are interrupts sent to the host by the XIVE
      hardware when it has an interrupt to deliver to a guest VCPU but that
      VCPU is not running anywhere in the system.  Hence we disable the
      escalation interrupt for the VCPU being run when we enter the guest
      and re-enable it when the guest does an H_CEDE hypercall indicating
      it is idle.
      
      It is possible that an escalation interrupt gets generated just as we
      are entering the guest.  In that case the escalation interrupt may be
      using a queue entry in one of the interrupt queues, and that queue
      entry may not have been processed when the guest exits with an H_CEDE.
      The existing entry code detects this situation and does not clear the
      vcpu->arch.xive_esc_on flag as an indication that there is a pending
      queue entry (if the queue entry gets processed, xive_esc_irq() will
      clear the flag).  There is a comment in the code saying that if the
      flag is still set on H_CEDE, we have to abort the cede rather than
      re-enabling the escalation interrupt, lest we end up with two
      occurrences of the escalation interrupt in the interrupt queue.
      
      However, the exit code doesn't do that; it aborts the cede in the sense
      that vcpu->arch.ceded gets cleared, but it still enables the escalation
      interrupt by setting the source's PQ bits to 00.  Instead we need to
      set the PQ bits to 10, indicating that an interrupt has been triggered.
      We also need to avoid setting vcpu->arch.xive_esc_on in this case
      (i.e. vcpu->arch.xive_esc_on seen to be set on H_CEDE) because
      xive_esc_irq() will run at some point and clear it, and if we race with
      that we may end up with an incorrect result (i.e. xive_esc_on set when
      the escalation interrupt has just been handled).
      
      It is extremely unlikely that having two queue entries would cause
      observable problems; theoretically it could cause queue overflow, but
      the CPU would have to have thousands of interrupts targetted to it for
      that to be possible.  However, this fix will also make it possible to
      determine accurately whether there is an unhandled escalation
      interrupt in the queue, which will be needed by the following patch.
      
      Fixes: 9b9b13a6 ("KVM: PPC: Book3S HV: Keep XIVE escalation interrupt masked unless ceded")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190813100349.GD9567@blackberry
      959c5d51
    • C
      KVM: PPC: Book3S HV: XIVE: Free escalation interrupts before disabling the VP · 237aed48
      Cédric Le Goater 提交于
      When a vCPU is brought done, the XIVE VP (Virtual Processor) is first
      disabled and then the event notification queues are freed. When freeing
      the queues, we check for possible escalation interrupts and free them
      also.
      
      But when a XIVE VP is disabled, the underlying XIVE ENDs also are
      disabled in OPAL. When an END (Event Notification Descriptor) is
      disabled, its ESB pages (ESn and ESe) are disabled and loads return all
      1s. Which means that any access on the ESB page of the escalation
      interrupt will return invalid values.
      
      When an interrupt is freed, the shutdown handler computes a 'saved_p'
      field from the value returned by a load in xive_do_source_set_mask().
      This value is incorrect for escalation interrupts for the reason
      described above.
      
      This has no impact on Linux/KVM today because we don't make use of it
      but we will introduce in future changes a xive_get_irqchip_state()
      handler. This handler will use the 'saved_p' field to return the state
      of an interrupt and 'saved_p' being incorrect, softlockup will occur.
      
      Fix the vCPU cleanup sequence by first freeing the escalation interrupts
      if any, then disable the XIVE VP and last free the queues.
      
      Fixes: 90c73795 ("KVM: PPC: Book3S HV: Add a new KVM device for the XIVE native exploitation mode")
      Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190806172538.5087-1-clg@kaod.org
      237aed48
  3. 25 7月, 2019 1 次提交
    • M
      treewide: add "WITH Linux-syscall-note" to SPDX tag of uapi headers · d9c52522
      Masahiro Yamada 提交于
      UAPI headers licensed under GPL are supposed to have exception
      "WITH Linux-syscall-note" so that they can be included into non-GPL
      user space application code.
      
      The exception note is missing in some UAPI headers.
      
      Some of them slipped in by the treewide conversion commit b2441318
      ("License cleanup: add SPDX GPL-2.0 license identifier to files with
      no license"). Just run:
      
        $ git show --oneline b2441318 -- arch/x86/include/uapi/asm/
      
      I believe they are not intentional, and should be fixed too.
      
      This patch was generated by the following script:
      
        git grep -l --not -e Linux-syscall-note --and -e SPDX-License-Identifier \
          -- :arch/*/include/uapi/asm/*.h :include/uapi/ :^*/Kbuild |
        while read file
        do
                sed -i -e '/[[:space:]]OR[[:space:]]/s/\(GPL-[^[:space:]]*\)/(\1 WITH Linux-syscall-note)/g' \
                -e '/[[:space:]]or[[:space:]]/s/\(GPL-[^[:space:]]*\)/(\1 WITH Linux-syscall-note)/g' \
                -e '/[[:space:]]OR[[:space:]]/!{/[[:space:]]or[[:space:]]/!s/\(GPL-[^[:space:]]*\)/\1 WITH Linux-syscall-note/g}' $file
        done
      
      After this patch is applied, there are 5 UAPI headers that do not contain
      "WITH Linux-syscall-note". They are kept untouched since this exception
      applies only to GPL variants.
      
        $ git grep --not -e Linux-syscall-note --and -e SPDX-License-Identifier \
          -- :arch/*/include/uapi/asm/*.h :include/uapi/ :^*/Kbuild
        include/uapi/drm/panfrost_drm.h:/* SPDX-License-Identifier: MIT */
        include/uapi/linux/batman_adv.h:/* SPDX-License-Identifier: MIT */
        include/uapi/linux/qemu_fw_cfg.h:/* SPDX-License-Identifier: BSD-3-Clause */
        include/uapi/linux/vbox_err.h:/* SPDX-License-Identifier: MIT */
        include/uapi/linux/virtio_iommu.h:/* SPDX-License-Identifier: BSD-3-Clause */
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9c52522
  4. 24 7月, 2019 1 次提交
  5. 22 7月, 2019 4 次提交
    • V
      powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails · 3a855b7a
      Vaibhav Jain 提交于
      In some cases initial bind of scm memory for an lpar can fail if
      previously it wasn't released using a scm-unbind hcall. This situation
      can arise due to panic of the previous kernel or forced lpar
      fadump. In such cases the H_SCM_BIND_MEM return a H_OVERLAP error.
      
      To mitigate such cases the patch updates papr_scm_probe() to force a
      call to drc_pmem_unbind() in case the initial bind of scm memory fails
      with EBUSY error. In case scm-bind operation again fails after the
      forced scm-unbind then we follow the existing error path. We also
      update drc_pmem_bind() to handle the H_OVERLAP error returned by phyp
      and indicate it as a EBUSY error back to the caller.
      Suggested-by: N"Oliver O'Halloran" <oohall@gmail.com>
      Signed-off-by: NVaibhav Jain <vaibhav@linux.ibm.com>
      Reviewed-by: NOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190629160610.23402-4-vaibhav@linux.ibm.com
      3a855b7a
    • V
      powerpc/papr_scm: Update drc_pmem_unbind() to use H_SCM_UNBIND_ALL · 0d7fc080
      Vaibhav Jain 提交于
      The new hcall named H_SCM_UNBIND_ALL has been introduce that can
      unbind all or specific scm memory assigned to an lpar. This is
      more efficient than using H_SCM_UNBIND_MEM as currently we don't
      support partial unbind of scm memory.
      
      Hence this patch proposes following changes to drc_pmem_unbind():
      
          * Update drc_pmem_unbind() to replace hcall H_SCM_UNBIND_MEM to
            H_SCM_UNBIND_ALL.
      
          * Update drc_pmem_unbind() to handles cases when PHYP asks the guest
            kernel to wait for specific amount of time before retrying the
            hcall via the 'LONG_BUSY' return value.
      
          * Ensure appropriate error code is returned back from the function
            in case of an error.
      Reviewed-by: NOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: NVaibhav Jain <vaibhav@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190629160610.23402-3-vaibhav@linux.ibm.com
      0d7fc080
    • V
      powerpc/pseries: Update SCM hcall op-codes in hvcall.h · 6d140e75
      Vaibhav Jain 提交于
      Update the hvcalls.h to include op-codes for new hcalls introduce to
      manage SCM memory. Also update existing hcall definitions to reflect
      current papr specification for SCM.
      
      The removed hcall op-codes H_SCM_MEM_QUERY, H_SCM_BLOCK_CLEAR were
      transient proposals and there support was never implemented by
      Power-VM nor they were used anywhere in Linux kernel. Hence we don't
      expect anyone to be impacted by this change.
      Signed-off-by: NVaibhav Jain <vaibhav@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190629160610.23402-2-vaibhav@linux.ibm.com
      6d140e75
    • M
      powerpc/tm: Fix oops on sigreturn on systems without TM · f16d80b7
      Michael Neuling 提交于
      On systems like P9 powernv where we have no TM (or P8 booted with
      ppc_tm=off), userspace can construct a signal context which still has
      the MSR TS bits set. The kernel tries to restore this context which
      results in the following crash:
      
        Unexpected TM Bad Thing exception at c0000000000022fc (msr 0x8000000102a03031) tm_scratch=800000020280f033
        Oops: Unrecoverable exception, sig: 6 [#1]
        LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in:
        CPU: 0 PID: 1636 Comm: sigfuz Not tainted 5.2.0-11043-g0a8ad0ff #69
        NIP:  c0000000000022fc LR: 00007fffb2d67e48 CTR: 0000000000000000
        REGS: c00000003fffbd70 TRAP: 0700   Not tainted  (5.2.0-11045-g7142b497d8)
        MSR:  8000000102a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[E]>  CR: 42004242  XER: 00000000
        CFAR: c0000000000022e0 IRQMASK: 0
        GPR00: 0000000000000072 00007fffb2b6e560 00007fffb2d87f00 0000000000000669
        GPR04: 00007fffb2b6e728 0000000000000000 0000000000000000 00007fffb2b6f2a8
        GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR12: 0000000000000000 00007fffb2b76900 0000000000000000 0000000000000000
        GPR16: 00007fffb2370000 00007fffb2d84390 00007fffea3a15ac 000001000a250420
        GPR20: 00007fffb2b6f260 0000000010001770 0000000000000000 0000000000000000
        GPR24: 00007fffb2d843a0 00007fffea3a14a0 0000000000010000 0000000000800000
        GPR28: 00007fffea3a14d8 00000000003d0f00 0000000000000000 00007fffb2b6e728
        NIP [c0000000000022fc] rfi_flush_fallback+0x7c/0x80
        LR [00007fffb2d67e48] 0x7fffb2d67e48
        Call Trace:
        Instruction dump:
        e96a0220 e96a02a8 e96a0330 e96a03b8 394a0400 4200ffdc 7d2903a6 e92d0c00
        e94d0c08 e96d0c10 e82d0c18 7db242a6 <4c000024> 7db243a6 7db142a6 f82d0c18
      
      The problem is the signal code assumes TM is enabled when
      CONFIG_PPC_TRANSACTIONAL_MEM is enabled. This may not be the case as
      with P9 powernv or if `ppc_tm=off` is used on P8.
      
      This means any local user can crash the system.
      
      Fix the problem by returning a bad stack frame to the user if they try
      to set the MSR TS bits with sigreturn() on systems where TM is not
      supported.
      
      Found with sigfuz kernel selftest on P9.
      
      This fixes CVE-2019-13648.
      
      Fixes: 2b0a576d ("powerpc: Add new transactional memory state to the signal context")
      Cc: stable@vger.kernel.org # v3.9
      Reported-by: NPraveen Pandey <Praveen.Pandey@in.ibm.com>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190719050502.405-1-mikey@neuling.org
      f16d80b7
  6. 19 7月, 2019 4 次提交
    • S
      powerpc/dma: Fix invalid DMA mmap behavior · b4fc36e6
      Shawn Anastasio 提交于
      The refactor of powerpc DMA functions in commit 6666cc17
      ("powerpc/dma: remove dma_nommu_mmap_coherent") incorrectly
      changes the way DMA mappings are handled on powerpc.
      Since this change, all mapped pages are marked as cache-inhibited
      through the default implementation of arch_dma_mmap_pgprot.
      This differs from the previous behavior of only marking pages
      in noncoherent mappings as cache-inhibited and has resulted in
      sporadic system crashes in certain hardware configurations and
      workloads (see Bugzilla).
      
      This commit restores the previous correct behavior by providing
      an implementation of arch_dma_mmap_pgprot that only marks
      pages in noncoherent mappings as cache-inhibited. As this behavior
      should be universal for all powerpc platforms a new file,
      dma-generic.c, was created to store it.
      
      Fixes: 6666cc17 ("powerpc/dma: remove dma_nommu_mmap_coherent")
      # NOTE: fixes commit 6666cc17 released in v5.1.
      # Consider a stable tag:
      # Cc: stable@vger.kernel.org # v5.1+
      # NOTE: fixes commit 6666cc17 released in v5.1.
      # Consider a stable tag:
      # Cc: stable@vger.kernel.org # v5.1+
      Cc: stable@vger.kernel.org # v5.1+
      Signed-off-by: NShawn Anastasio <shawn@anastas.io>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190717235437.12908-1-shawn@anastas.io
      b4fc36e6
    • C
      KVM: PPC: Book3S HV: XIVE: fix rollback when kvmppc_xive_create fails · 9798f4ea
      Cédric Le Goater 提交于
      The XIVE device structure is now allocated in kvmppc_xive_get_device()
      and kfree'd in kvmppc_core_destroy_vm(). In case of an OPAL error when
      allocating the XIVE VPs, the kfree() call in kvmppc_xive_*create()
      will result in a double free and corrupt the host memory.
      
      Fixes: 5422e951 ("KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' method")
      Cc: stable@vger.kernel.org # v5.2+
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Tested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/6ea6998b-a890-2511-01d1-747d7621eb19@kaod.org
      9798f4ea
    • D
      mm/memory_hotplug: rename walk_memory_range() and pass start+size instead of pfns · fbcf73ce
      David Hildenbrand 提交于
      walk_memory_range() was once used to iterate over sections.  Now, it
      iterates over memory blocks.  Rename the function, fixup the
      documentation.
      
      Also, pass start+size instead of PFNs, which is what most callers
      already have at hand.  (we'll rework link_mem_sections() most probably
      soon)
      
      Follow-up patches will rework, simplify, and move walk_memory_blocks()
      to drivers/base/memory.c.
      
      Note: walk_memory_blocks() only works correctly right now if the
      start_pfn is aligned to a section start.  This is the case right now,
      but we'll generalize the function in a follow up patch so the semantics
      match the documentation.
      
      [akpm@linux-foundation.org: remove unused variable]
      Link: http://lkml.kernel.org/r/20190614100114.311-5-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Rashmica Gupta <rashmica.g@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fbcf73ce
    • D
      mm/memory_hotplug: allow arch_remove_memory() without CONFIG_MEMORY_HOTREMOVE · 80ec922d
      David Hildenbrand 提交于
      We want to improve error handling while adding memory by allowing to use
      arch_remove_memory() and __remove_pages() even if
      CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:
      
      	arch_add_memory()
      	rc = do_something();
      	if (rc) {
      		arch_remove_memory();
      	}
      
      We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
      quite some dependencies for memory offlining.
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-7-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      80ec922d
  7. 18 7月, 2019 1 次提交
    • G
      powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask() · 4d202c8c
      Gautham R. Shenoy 提交于
      xive_find_target_in_mask() has the following for(;;) loop which has a
      bug when @first == cpumask_first(@mask) and condition 1 fails to hold
      for every CPU in @mask. In this case we loop forever in the for-loop.
      
        first = cpu;
        for (;;) {
        	  if (cpu_online(cpu) && xive_try_pick_target(cpu)) // condition 1
      		  return cpu;
      	  cpu = cpumask_next(cpu, mask);
      	  if (cpu == first) // condition 2
      		  break;
      
      	  if (cpu >= nr_cpu_ids) // condition 3
      		  cpu = cpumask_first(mask);
        }
      
      This is because, when @first == cpumask_first(@mask), we never hit the
      condition 2 (cpu == first) since prior to this check, we would have
      executed "cpu = cpumask_next(cpu, mask)" which will set the value of
      @cpu to a value greater than @first or to nr_cpus_ids. When this is
      coupled with the fact that condition 1 is not met, we will never exit
      this loop.
      
      This was discovered by the hard-lockup detector while running LTP test
      concurrently with SMT switch tests.
      
       watchdog: CPU 12 detected hard LOCKUP on other CPUs 68
       watchdog: CPU 12 TB:85587019220796, last SMP heartbeat TB:85578827223399 (15999ms ago)
       watchdog: CPU 68 Hard LOCKUP
       watchdog: CPU 68 TB:85587019361273, last heartbeat TB:85576815065016 (19930ms ago)
       CPU: 68 PID: 45050 Comm: hxediag Kdump: loaded Not tainted 4.18.0-100.el8.ppc64le #1
       NIP:  c0000000006f5578 LR: c000000000cba9ec CTR: 0000000000000000
       REGS: c000201fff3c7d80 TRAP: 0100   Not tainted  (4.18.0-100.el8.ppc64le)
       MSR:  9000000002883033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 24028424  XER: 00000000
       CFAR: c0000000006f558c IRQMASK: 1
       GPR00: c0000000000afc58 c000201c01c43400 c0000000015ce500 c000201cae26ec18
       GPR04: 0000000000000800 0000000000000540 0000000000000800 00000000000000f8
       GPR08: 0000000000000020 00000000000000a8 0000000080000000 c00800001a1beed8
       GPR12: c0000000000b1410 c000201fff7f4c00 0000000000000000 0000000000000000
       GPR16: 0000000000000000 0000000000000000 0000000000000540 0000000000000001
       GPR20: 0000000000000048 0000000010110000 c00800001a1e3780 c000201cae26ed18
       GPR24: 0000000000000000 c000201cae26ed8c 0000000000000001 c000000001116bc0
       GPR28: c000000001601ee8 c000000001602494 c000201cae26ec18 000000000000001f
       NIP [c0000000006f5578] find_next_bit+0x38/0x90
       LR [c000000000cba9ec] cpumask_next+0x2c/0x50
       Call Trace:
       [c000201c01c43400] [c000201cae26ec18] 0xc000201cae26ec18 (unreliable)
       [c000201c01c43420] [c0000000000afc58] xive_find_target_in_mask+0x1b8/0x240
       [c000201c01c43470] [c0000000000b0228] xive_pick_irq_target.isra.3+0x168/0x1f0
       [c000201c01c435c0] [c0000000000b1470] xive_irq_startup+0x60/0x260
       [c000201c01c43640] [c0000000001d8328] __irq_startup+0x58/0xf0
       [c000201c01c43670] [c0000000001d844c] irq_startup+0x8c/0x1a0
       [c000201c01c436b0] [c0000000001d57b0] __setup_irq+0x9f0/0xa90
       [c000201c01c43760] [c0000000001d5aa0] request_threaded_irq+0x140/0x220
       [c000201c01c437d0] [c00800001a17b3d4] bnx2x_nic_load+0x188c/0x3040 [bnx2x]
       [c000201c01c43950] [c00800001a187c44] bnx2x_self_test+0x1fc/0x1f70 [bnx2x]
       [c000201c01c43a90] [c000000000adc748] dev_ethtool+0x11d8/0x2cb0
       [c000201c01c43b60] [c000000000b0b61c] dev_ioctl+0x5ac/0xa50
       [c000201c01c43bf0] [c000000000a8d4ec] sock_do_ioctl+0xbc/0x1b0
       [c000201c01c43c60] [c000000000a8dfb8] sock_ioctl+0x258/0x4f0
       [c000201c01c43d20] [c0000000004c9704] do_vfs_ioctl+0xd4/0xa70
       [c000201c01c43de0] [c0000000004ca274] sys_ioctl+0xc4/0x160
       [c000201c01c43e30] [c00000000000b388] system_call+0x5c/0x70
       Instruction dump:
       78aad182 54a806be 3920ffff 78a50664 794a1f24 7d294036 7d43502a 7d295039
       4182001c 48000034 78a9d182 79291f24 <7d23482a> 2fa90000 409e0020 38a50040
      
      To fix this, move the check for condition 2 after the check for
      condition 3, so that we are able to break out of the loop soon after
      iterating through all the CPUs in the @mask in the problem case. Use
      do..while() to achieve this.
      
      Fixes: 243e2511 ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
      Cc: stable@vger.kernel.org # v4.12+
      Reported-by: NIndira P. Joga <indira.priya@in.ibm.com>
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1563359724-13931-1-git-send-email-ego@linux.vnet.ibm.com
      4d202c8c
  8. 17 7月, 2019 7 次提交
  9. 15 7月, 2019 6 次提交
    • A
      powerpc: fix off by one in max_zone_pfn initialization for ZONE_DMA · 03800e05
      Andrea Arcangeli 提交于
      25078dc1 first introduced an off by
      one error in the ZONE_DMA initialization of PPC_BOOK3E_64=y and since
      9739ab7e the off by one applies to
      PPC32=y too. This simply corrects the off by one and should resolve
      crashes like below:
      
      [   65.179101] page 0x7fff outside node 0 zone DMA [ 0x0 - 0x7fff ]
      
      Unfortunately in various MM places "max" means a non inclusive end of
      range. free_area_init_nodes max_zone_pfn parameter is one case and
      MAX_ORDER is another one (unrelated) that comes by memory.
      Reported-by: NZorro Lang <zlang@redhat.com>
      Fixes: 25078dc1 ("powerpc: use mm zones more sensibly")
      Fixes: 9739ab7e ("powerpc: enable a 30-bit ZONE_DMA for 32-bit pmac")
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190625141727.2883-1-aarcange@redhat.com
      03800e05
    • S
      KVM: PPC: Book3S HV: Save and restore guest visible PSSCR bits on pseries · c8b4083d
      Suraj Jitindar Singh 提交于
      The Performance Stop Status and Control Register (PSSCR) is used to
      control the power saving facilities of the processor. This register
      has various fields, some of which can be modified only in hypervisor
      state, and others which can be modified in both hypervisor and
      privileged non-hypervisor state. The bits which can be modified in
      privileged non-hypervisor state are referred to as guest visible.
      
      Currently the L0 hypervisor saves and restores both it's own host
      value as well as the guest value of the PSSCR when context switching
      between the hypervisor and guest. However a nested hypervisor running
      it's own nested guests (as indicated by kvmhv_on_pseries()) doesn't
      context switch the PSSCR register. That means if a nested (L2) guest
      modifies the PSSCR then the L1 guest hypervisor will run with that
      modified value, and if the L1 guest hypervisor modifies the PSSCR and
      then goes to run the nested (L2) guest again then the L2 PSSCR value
      will be lost.
      
      Fix this by having the (L1) nested hypervisor save and restore both
      its host and the guest PSSCR value when entering and exiting a
      nested (L2) guest. Note that only the guest visible parts of the PSSCR
      are context switched since this is all the L1 nested hypervisor can
      access, this is fine however as these are the only fields the L0
      hypervisor provides guest control of anyway and so all other fields
      are ignored.
      
      This could also have been implemented by adding the PSSCR register to
      the hv_regs passed to the L0 hypervisor as input to the H_ENTER_NESTED
      hcall, however this would have meant updating the structure layout and
      thus required modifications to both the L0 and L1 kernels. Whereas the
      approach used doesn't require L0 kernel modifications while achieving
      the same result.
      
      Fixes: 95a6432c ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190703012022.15644-3-sjitindarsingh@gmail.com
      c8b4083d
    • S
      powerpc/pmu: Set pmcregs_in_use in paca when running as LPAR · 28d2a6e6
      Suraj Jitindar Singh 提交于
      The ability to run nested guests under KVM means that a guest can also
      act as a hypervisor for it's own nested guest. Currently
      ppc_set_pmu_inuse() assumes that either FW_FEATURE_LPAR is set,
      indicating a guest environment, and so sets the pmcregs_in_use flag in
      the lppaca, or that it isn't set, indicating a hypervisor environment,
      and so sets the pmcregs_in_use flag in the paca.
      
      The pmcregs_in_use flag in the lppaca is used to communicate this
      information to a hypervisor and so must be set in a guest environment.
      The pmcregs_in_use flag in the paca is used by KVM code to determine
      whether the host state of the performance monitoring unit (PMU) must
      be saved and restored when running a guest.
      
      Thus when a guest also acts as a hypervisor it must set this bit in
      both places since it needs to ensure both that the real hypervisor
      saves it's PMU registers when it runs (requires pmcregs_in_use flag in
      lppaca), and that it saves it's own PMU registers when running a
      nested guest (requires pmcregs_in_use flag in paca).
      
      Modify ppc_set_pmu_inuse() so that the pmcregs_in_use bit is set in
      both the lppaca and the paca when a guest (LPAR) is running with the
      capability of running it's own guests (CONFIG_KVM_BOOK3S_HV_POSSIBLE).
      
      Fixes: 95a6432c ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190703012022.15644-2-sjitindarsingh@gmail.com
      28d2a6e6
    • S
      KVM: PPC: Book3S HV: Always save guest pmu for guest capable of nesting · 63279eeb
      Suraj Jitindar Singh 提交于
      The performance monitoring unit (PMU) registers are saved on guest
      exit when the guest has set the pmcregs_in_use flag in its lppaca, if
      it exists, or unconditionally if it doesn't. If a nested guest is
      being run then the hypervisor doesn't, and in most cases can't, know
      if the PMU registers are in use since it doesn't know the location of
      the lppaca for the nested guest, although it may have one for its
      immediate guest. This results in the values of these registers being
      lost across nested guest entry and exit in the case where the nested
      guest was making use of the performance monitoring facility while it's
      nested guest hypervisor wasn't.
      
      Further more the hypervisor could interrupt a guest hypervisor between
      when it has loaded up the PMU registers and it calling H_ENTER_NESTED
      or between returning from the nested guest to the guest hypervisor and
      the guest hypervisor reading the PMU registers, in
      kvmhv_p9_guest_entry(). This means that it isn't sufficient to just
      save the PMU registers when entering or exiting a nested guest, but
      that it is necessary to always save the PMU registers whenever a guest
      is capable of running nested guests to ensure the register values
      aren't lost in the context switch.
      
      Ensure the PMU register values are preserved by always saving their
      value into the vcpu struct when a guest is capable of running nested
      guests.
      
      This should have minimal performance impact however any impact can be
      avoided by booting a guest with "-machine pseries,cap-nested-hv=false"
      on the qemu commandline.
      
      Fixes: 95a6432c ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190703012022.15644-1-sjitindarsingh@gmail.com
      63279eeb
    • S
      powerpc/mm: Limit rma_size to 1TB when running without HV mode · da0ef933
      Suraj Jitindar Singh 提交于
      The virtual real mode addressing (VRMA) mechanism is used when a
      partition is using HPT (Hash Page Table) translation and performs real
      mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this mode
      effective address bits 0:23 are treated as zero (i.e. the access is
      aliased to 0) and the access is performed using an implicit 1TB SLB
      entry.
      
      The size of the RMA (Real Memory Area) is communicated to the guest as
      the size of the first memory region in the device tree. And because of
      the mechanism described above can be expected to not exceed 1TB. In
      the event that the host erroneously represents the RMA as being larger
      than 1TB, guest accesses in real mode to memory addresses above 1TB
      will be aliased down to below 1TB. This means that a memory access
      performed in real mode may differ to one performed in virtual mode for
      the same memory address, which would likely have unintended
      consequences.
      
      To avoid this outcome have the guest explicitly limit the size of the
      RMA to the current maximum, which is 1TB. This means that even if the
      first memory block is larger than 1TB, only the first 1TB should be
      accessed in real mode.
      
      Fixes: c610d65c ("powerpc/pseries: lift RTAS limit for hash")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Tested-by: NSatheesh Rajendran <sathnaga@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190710052018.14628-1-sjitindarsingh@gmail.com
      da0ef933
    • C
      arch: mark syscall number 435 reserved for clone3 · 1a271a68
      Christian Brauner 提交于
      A while ago Arnd made it possible to give new system calls the same
      syscall number on all architectures (except alpha). To not break this
      nice new feature let's mark 435 for clone3 as reserved on all
      architectures that do not yet implement it.
      Even if an architecture does not plan to implement it this ensures that
      new system calls coming after clone3 will have the same number on all
      architectures.
      Signed-off-by: NChristian Brauner <christian@brauner.io>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-m68k@lists.linux-m68k.org
      Cc: linux-mips@vger.kernel.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Link: https://lore.kernel.org/r/20190714192205.27190-2-christian@brauner.ioReviewed-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NChristian Brauner <christian@brauner.io>
      1a271a68
  10. 13 7月, 2019 4 次提交
  11. 12 7月, 2019 1 次提交
  12. 11 7月, 2019 1 次提交
    • O
      powerpc/eeh: Handle hugepages in ioremap space · 33439620
      Oliver O'Halloran 提交于
      In commit 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap
      space") support for using hugepages in the vmalloc and ioremap areas was
      enabled for radix. Unfortunately this broke EEH MMIO error checking.
      
      Detection works by inserting a hook which checks the results of the
      ioreadXX() set of functions.  When a read returns a 0xFFs response we
      need to check for an error which we do by mapping the (virtual) MMIO
      address back to a physical address, then mapping physical address to a
      PCI device via an interval tree.
      
      When translating virt -> phys we currently assume the ioremap space is
      only populated by PAGE_SIZE mappings. If a hugepage mapping is found we
      emit a WARN_ON(), but otherwise handles the check as though a normal
      page was found. In pathalogical cases such as copying a buffer
      containing a lot of 0xFFs from BAR memory this can result in the system
      not booting because it's too busy printing WARN_ON()s.
      
      There's no real reason to assume huge pages can't be present and we're
      prefectly capable of handling them, so do that.
      
      Fixes: 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap space")
      Reported-by: NSachin Sant <sachinp@linux.vnet.ibm.com>
      Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
      Tested-by: NSachin Sant <sachinp@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190710150517.27114-1-oohall@gmail.com
      33439620
  13. 10 7月, 2019 3 次提交
    • M
      powerpc/boot: pass CONFIG options in a simpler and more robust way · 4ba7f80f
      Masahiro Yamada 提交于
      Commit 5e9dcb61 ("powerpc/boot: Expose Kconfig symbols to wrapper")
      was wrong, but commit e41b93a6 ("powerpc/boot: Fix build failures
      with -j 1") was also wrong.
      
      The correct dependency is:
      
        $(obj)/serial.o: $(obj)/autoconf.h
      
      However, I do not see the reason why we need to copy autoconf.h to
      arch/power/boot/. Nor do I see consistency in the way of passing
      CONFIG options.
      
      decompress.c references CONFIG_KERNEL_GZIP and CONFIG_KERNEL_XZ, which
      are passed via the command line.
      
      serial.c includes autoconf.h to reference a couple of CONFIG options,
      but this is fragile because we often forget to include "autoconf.h"
      from source files.
      
      In fact, it is already broken.
      
      ppc_asm.h references CONFIG_PPC_8xx, but utils.S is not given any way
      to access CONFIG options. So, CONFIG_PPC_8xx is never defined here.
      
      Pass $(LINUXINCLUDE) to make sure CONFIG options are accessible from
      all .c and .S files in arch/powerpc/boot/.
      
      I also removed the -traditional flag to make include/linux/kconfig.h
      work. This flag makes the preprocessor imitate the behavior of the
      pre-standard C compiler, but I do not understand why it is necessary.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190705100144.28785-2-yamada.masahiro@socionext.com
      4ba7f80f
    • M
      powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h · 9e005b76
      Masahiro Yamada 提交于
      The next commit will make the way of passing CONFIG options more robust.
      Unfortunately, it would uncover another hidden issue; without this
      commit, skiroot_defconfig would be broken like this:
      
      |   WRAP    arch/powerpc/boot/zImage.pseries
      | arch/powerpc/boot/wrapper.a(decompress.o): In function `bcj_powerpc.isra.10':
      | decompress.c:(.text+0x720): undefined reference to `get_unaligned_be32'
      | decompress.c:(.text+0x7a8): undefined reference to `put_unaligned_be32'
      | make[1]: *** [arch/powerpc/boot/Makefile;383: arch/powerpc/boot/zImage.pseries] Error 1
      | make: *** [arch/powerpc/Makefile;295: zImage] Error 2
      
      skiroot_defconfig is the only defconfig that enables CONFIG_KERNEL_XZ
      for ppc, which has never been correctly built before.
      
      I figured out the root cause in lib/decompress_unxz.c:
      
      | #ifdef CONFIG_PPC
      | #      define XZ_DEC_POWERPC
      | #endif
      
      CONFIG_PPC is undefined here in the ppc bootwrapper because autoconf.h
      is not included except by arch/powerpc/boot/serial.c
      
      XZ_DEC_POWERPC is not defined, therefore, bcj_powerpc() is not compiled
      for the bootwrapper.
      
      With the next commit passing CONFIG_PPC correctly, we would realize that
      {get,put}_unaligned_be32 was missing.
      
      Unlike the other decompressors, the ppc bootwrapper duplicates all the
      necessary helpers in arch/powerpc/boot/.
      
      The other architectures define __KERNEL__ and pull in helpers for
      building the decompressors.
      
      If ppc bootwrapper had defined __KERNEL__, lib/xz/xz_private.h would
      have included <asm/unaligned.h>:
      
      | #ifdef __KERNEL__
      | #       include <linux/xz.h>
      | #       include <linux/kernel.h>
      | #       include <asm/unaligned.h>
      
      However, doing so would cause tons of definition conflicts since the
      bootwrapper has duplicated everything.
      
      I just added copies of {get,put}_unaligned_be32, following the
      bootwrapper coding convention.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190705100144.28785-1-yamada.masahiro@socionext.com
      9e005b76
    • M
      powerpc/irq: Don't WARN continuously in arch_local_irq_restore() · 0fc12c02
      Michael Ellerman 提交于
      When CONFIG_PPC_IRQ_SOFT_MASK_DEBUG is enabled (uncommon), we have a
      series of WARN_ON's in arch_local_irq_restore().
      
      These are "should never happen" conditions, but if they do happen they
      can flood the console and render the system unusable. So switch them
      to WARN_ON_ONCE().
      
      Fixes: e2b36d59 ("powerpc/64: Don't trace code that runs with the soft irq mask unreconciled")
      Fixes: 9b81c021 ("powerpc/64s: make PACA_IRQ_HARD_DIS track MSR[EE] closely")
      Fixes: 7c0482e3 ("powerpc/irq: Fix another case of lazy IRQ state getting out of sync")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190708061046.7075-1-mpe@ellerman.id.au
      0fc12c02
  14. 05 7月, 2019 1 次提交