1. 30 4月, 2019 23 次提交
    • C
      KVM: PPC: Book3S HV: XIVE: Add passthrough support · 232b984b
      Cédric Le Goater 提交于
      The KVM XICS-over-XIVE device and the proposed KVM XIVE native device
      implement an IRQ space for the guest using the generic IPI interrupts
      of the XIVE IC controller. These interrupts are allocated at the OPAL
      level and "mapped" into the guest IRQ number space in the range 0-0x1FFF.
      Interrupt management is performed in the XIVE way: using loads and
      stores on the addresses of the XIVE IPI interrupt ESB pages.
      
      Both KVM devices share the same internal structure caching information
      on the interrupts, among which the xive_irq_data struct containing the
      addresses of the IPI ESB pages and an extra one in case of pass-through.
      The later contains the addresses of the ESB pages of the underlying HW
      controller interrupts, PHB4 in all cases for now.
      
      A guest, when running in the XICS legacy interrupt mode, lets the KVM
      XICS-over-XIVE device "handle" interrupt management, that is to
      perform the loads and stores on the addresses of the ESB pages of the
      guest interrupts. However, when running in XIVE native exploitation
      mode, the KVM XIVE native device exposes the interrupt ESB pages to
      the guest and lets the guest perform directly the loads and stores.
      
      The VMA exposing the ESB pages make use of a custom VM fault handler
      which role is to populate the VMA with appropriate pages. When a fault
      occurs, the guest IRQ number is deduced from the offset, and the ESB
      pages of associated XIVE IPI interrupt are inserted in the VMA (using
      the internal structure caching information on the interrupts).
      
      Supporting device passthrough in the guest running in XIVE native
      exploitation mode adds some extra refinements because the ESB pages
      of a different HW controller (PHB4) need to be exposed to the guest
      along with the initial IPI ESB pages of the XIVE IC controller. But
      the overall mechanic is the same.
      
      When the device HW irqs are mapped into or unmapped from the guest
      IRQ number space, the passthru_irq helpers, kvmppc_xive_set_mapped()
      and kvmppc_xive_clr_mapped(), are called to record or clear the
      passthrough interrupt information and to perform the switch.
      
      The approach taken by this patch is to clear the ESB pages of the
      guest IRQ number being mapped and let the VM fault handler repopulate.
      The handler will insert the ESB page corresponding to the HW interrupt
      of the device being passed-through or the initial IPI ESB page if the
      device is being removed.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      232b984b
    • C
      KVM: PPC: Book3S HV: XIVE: Add a mapping for the source ESB pages · 6520ca64
      Cédric Le Goater 提交于
      Each source is associated with an Event State Buffer (ESB) with a
      even/odd pair of pages which provides commands to manage the source:
      to trigger, to EOI, to turn off the source for instance.
      
      The custom VM fault handler will deduce the guest IRQ number from the
      offset of the fault, and the ESB page of the associated XIVE interrupt
      will be inserted into the VMA using the internal structure caching
      information on the interrupts.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      6520ca64
    • C
      KVM: PPC: Book3S HV: XIVE: Add a TIMA mapping · 39e9af3d
      Cédric Le Goater 提交于
      Each thread has an associated Thread Interrupt Management context
      composed of a set of registers. These registers let the thread handle
      priority management and interrupt acknowledgment. The most important
      are :
      
          - Interrupt Pending Buffer     (IPB)
          - Current Processor Priority   (CPPR)
          - Notification Source Register (NSR)
      
      They are exposed to software in four different pages each proposing a
      view with a different privilege. The first page is for the physical
      thread context and the second for the hypervisor. Only the third
      (operating system) and the fourth (user level) are exposed the guest.
      
      A custom VM fault handler will populate the VMA with the appropriate
      pages, which should only be the OS page for now.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      39e9af3d
    • C
      KVM: PPC: Book3S HV: XIVE: Add get/set accessors for the VP XIVE state · e4945b9d
      Cédric Le Goater 提交于
      The state of the thread interrupt management registers needs to be
      collected for migration. These registers are cached under the
      'xive_saved_state.w01' field of the VCPU when the VPCU context is
      pulled from the HW thread. An OPAL call retrieves the backup of the
      IPB register in the underlying XIVE NVT structure and merges it in the
      KVM state.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e4945b9d
    • C
      KVM: PPC: Book3S HV: XIVE: Add a control to dirty the XIVE EQ pages · e6714bd1
      Cédric Le Goater 提交于
      When migration of a VM is initiated, a first copy of the RAM is
      transferred to the destination before the VM is stopped, but there is
      no guarantee that the EQ pages in which the event notifications are
      queued have not been modified.
      
      To make sure migration will capture a consistent memory state, the
      XIVE device should perform a XIVE quiesce sequence to stop the flow of
      event notifications and stabilize the EQs. This is the purpose of the
      KVM_DEV_XIVE_EQ_SYNC control which will also marks the EQ pages dirty
      to force their transfer.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e6714bd1
    • C
      KVM: PPC: Book3S HV: XIVE: Add a control to sync the sources · 7b46b616
      Cédric Le Goater 提交于
      This control will be used by the H_INT_SYNC hcall from QEMU to flush
      event notifications on the XIVE IC owning the source.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7b46b616
    • C
      KVM: PPC: Book3S HV: XIVE: Add a global reset control · 5ca80647
      Cédric Le Goater 提交于
      This control is to be used by the H_INT_RESET hcall from QEMU. Its
      purpose is to clear all configuration of the sources and EQs. This is
      necessary in case of a kexec (for a kdump kernel for instance) to make
      sure that no remaining configuration is left from the previous boot
      setup so that the new kernel can start safely from a clean state.
      
      The queue 7 is ignored when the XIVE device is configured to run in
      single escalation mode. Prio 7 is used by escalations.
      
      The XIVE VP is kept enabled as the vCPU is still active and connected
      to the XIVE device.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      5ca80647
    • C
      KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration · 13ce3297
      Cédric Le Goater 提交于
      These controls will be used by the H_INT_SET_QUEUE_CONFIG and
      H_INT_GET_QUEUE_CONFIG hcalls from QEMU to configure the underlying
      Event Queue in the XIVE IC. They will also be used to restore the
      configuration of the XIVE EQs and to capture the internal run-time
      state of the EQs. Both 'get' and 'set' rely on an OPAL call to access
      the EQ toggle bit and EQ index which are updated by the XIVE IC when
      event notifications are enqueued in the EQ.
      
      The value of the guest physical address of the event queue is saved in
      the XIVE internal xive_q structure for later use. That is when
      migration needs to mark the EQ pages dirty to capture a consistent
      memory state of the VM.
      
      To be noted that H_INT_SET_QUEUE_CONFIG does not require the extra
      OPAL call setting the EQ toggle bit and EQ index to configure the EQ,
      but restoring the EQ state will.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      13ce3297
    • C
      KVM: PPC: Book3S HV: XIVE: Add a control to configure a source · e8676ce5
      Cédric Le Goater 提交于
      This control will be used by the H_INT_SET_SOURCE_CONFIG hcall from
      QEMU to configure the target of a source and also to restore the
      configuration of a source when migrating the VM.
      
      The XIVE source interrupt structure is extended with the value of the
      Effective Interrupt Source Number. The EISN is the interrupt number
      pushed in the event queue that the guest OS will use to dispatch
      events internally. Caching the EISN value in KVM eases the test when
      checking if a reconfiguration is indeed needed.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e8676ce5
    • C
      KVM: PPC: Book3S HV: XIVE: add a control to initialize a source · 4131f83c
      Cédric Le Goater 提交于
      The XIVE KVM device maintains a list of interrupt sources for the VM
      which are allocated in the pool of generic interrupts (IPIs) of the
      main XIVE IC controller. These are used for the CPU IPIs as well as
      for virtual device interrupts. The IRQ number space is defined by
      QEMU.
      
      The XIVE device reuses the source structures of the XICS-on-XIVE
      device for the source blocks (2-level tree) and for the source
      interrupts. Under XIVE native, the source interrupt caches mostly
      configuration information and is less used than under the XICS-on-XIVE
      device in which hcalls are still necessary at run-time.
      
      When a source is initialized in KVM, an IPI interrupt source is simply
      allocated at the OPAL level and then MASKED. KVM only needs to know
      about its type: LSI or MSI.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      4131f83c
    • C
      KVM: PPC: Book3S HV: XIVE: Introduce a new capability KVM_CAP_PPC_IRQ_XIVE · eacc56bb
      Cédric Le Goater 提交于
      The user interface exposes a new capability KVM_CAP_PPC_IRQ_XIVE to
      let QEMU connect the vCPU presenters to the XIVE KVM device if
      required. The capability is not advertised for now as the full support
      for the XIVE native exploitation mode is not yet available. When this
      is case, the capability will be advertised on PowerNV Hypervisors
      only. Nested guests (pseries KVM Hypervisor) are not supported.
      
      Internally, the interface to the new KVM device is protected with a
      new interrupt mode: KVMPPC_IRQ_XIVE.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      eacc56bb
    • C
      KVM: PPC: Book3S HV: Add a new KVM device for the XIVE native exploitation mode · 90c73795
      Cédric Le Goater 提交于
      This is the basic framework for the new KVM device supporting the XIVE
      native exploitation mode. The user interface exposes a new KVM device
      to be created by QEMU, only available when running on a L0 hypervisor.
      Support for nested guests is not available yet.
      
      The XIVE device reuses the device structure of the XICS-on-XIVE device
      as they have a lot in common. That could possibly change in the future
      if the need arise.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      90c73795
    • S
      KVM: PPC: Book3S HV: Save/restore vrsave register in kvmhv_p9_guest_entry() · 44b198ae
      Suraj Jitindar Singh 提交于
      On POWER9 and later processors where the host can schedule vcpus on a
      per thread basis, there is a streamlined entry path used when the guest
      is radix. This entry path saves/restores the fp and vr state in
      kvmhv_p9_guest_entry() by calling store_[fp/vr]_state() and
      load_[fp/vr]_state(). This is the same as the old entry path however the
      old entry path also saved/restored the VRSAVE register, which isn't done
      in the new entry path.
      
      This means that the vrsave register is now volatile across guest exit,
      which is an incorrect change in behaviour.
      
      Fix this by saving/restoring the vrsave register in kvmhv_p9_guest_entry().
      This restores the old, correct, behaviour.
      
      Fixes: 95a6432c ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      44b198ae
    • P
      KVM: PPC: Book3S HV: Flush TLB on secondary radix threads · 70ea13f6
      Paul Mackerras 提交于
      When running on POWER9 with kvm_hv.indep_threads_mode = N and the host
      in SMT1 mode, KVM will run guest VCPUs on offline secondary threads.
      If those guests are in radix mode, we fail to load the LPID and flush
      the TLB if necessary, leading to the guest crashing with an
      unsupported MMU fault.  This arises from commit 9a4506e1 ("KVM:
      PPC: Book3S HV: Make radix handle process scoped LPID flush in C,
      with relocation on", 2018-05-17), which didn't consider the case
      where indep_threads_mode = N.
      
      For simplicity, this makes the real-mode guest entry path flush the
      TLB in the same place for both radix and hash guests, as we did before
      9a4506e1, though the code is now C code rather than assembly code.
      We also have the radix TLB flush open-coded rather than calling
      radix__local_flush_tlb_lpid_guest(), because the TLB flush can be
      called in real mode, and in real mode we don't want to invoke the
      tracepoint code.
      
      Fixes: 9a4506e1 ("KVM: PPC: Book3S HV: Make radix handle process scoped LPID flush in C, with relocation on")
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      70ea13f6
    • P
      KVM: PPC: Book3S HV: Move HPT guest TLB flushing to C code · 2940ba0c
      Paul Mackerras 提交于
      This replaces assembler code in book3s_hv_rmhandlers.S that checks
      the kvm->arch.need_tlb_flush cpumask and optionally does a TLB flush
      with C code in book3s_hv_builtin.c.  Note that unlike the radix
      version, the hash version doesn't do an explicit ERAT invalidation
      because we will invalidate and load up the SLB before entering the
      guest, and that will invalidate the ERAT.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      2940ba0c
    • S
      KVM: PPC: Book3S HV: Handle virtual mode in XIVE VCPU push code · 7ae9bda7
      Suraj Jitindar Singh 提交于
      The code in book3s_hv_rmhandlers.S that pushes the XIVE virtual CPU
      context to the hardware currently assumes it is being called in real
      mode, which is usually true.  There is however a path by which it can
      be executed in virtual mode, in the case where indep_threads_mode = N.
      A virtual CPU executing on an offline secondary thread can take a
      hypervisor interrupt in virtual mode and return from the
      kvmppc_hv_entry() call after the kvm_secondary_got_guest label.
      It is possible for it to be given another vcpu to execute before it
      gets to execute the stop instruction.  In that case it will call
      kvmppc_hv_entry() for the second VCPU in virtual mode, and the XIVE
      vCPU push code will be executed in virtual mode.  The result in that
      case will be a host crash due to an unexpected data storage interrupt
      caused by executing the stdcix instruction in virtual mode.
      
      This fixes it by adding a code path for virtual mode, which uses the
      virtual TIMA pointer and normal load/store instructions.
      
      [paulus@ozlabs.org - wrote patch description]
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7ae9bda7
    • P
      KVM: PPC: Book3S HV: Fix XICS-on-XIVE H_IPI when priority = 0 · 1f80ba3d
      Paul Mackerras 提交于
      This fixes a bug in the XICS emulation on POWER9 machines which is
      triggered by the guest doing a H_IPI with priority = 0 (the highest
      priority).  What happens is that the notification interrupt arrives
      at the destination at priority zero.  The loop in scan_interrupts()
      sees that a priority 0 interrupt is pending, but because xc->mfrr is
      zero, we break out of the loop before taking the notification
      interrupt out of the queue and EOI-ing it.  (This doesn't happen
      when xc->mfrr != 0; in that case we process the priority-0 notification
      interrupt on the first iteration of the loop, and then break out of
      a subsequent iteration of the loop with hirq == XICS_IPI.)
      
      To fix this, we move the prio >= xc->mfrr check down to near the end
      of the loop.  However, there are then some other things that need to
      be adjusted.  Since we are potentially handling the notification
      interrupt and also delivering an IPI to the guest in the same loop
      iteration, we need to update pending and handle any q->pending_count
      value before the xc->mfrr check, rather than at the end of the loop.
      Also, we need to update the queue pointers when we have processed and
      EOI-ed the notification interrupt, since we may not do it later.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      1f80ba3d
    • P
      KVM: PPC: Book3S HV: smb->smp comment fixup · 6fabc9f2
      Palmer Dabbelt 提交于
      I made the same typo when trying to grep for uses of smp_wmb and figured
      I might as well fix it.
      Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      6fabc9f2
    • A
      KVM: PPC: Book3S: Allocate guest TCEs on demand too · e1a1ef84
      Alexey Kardashevskiy 提交于
      We already allocate hardware TCE tables in multiple levels and skip
      intermediate levels when we can, now it is a turn of the KVM TCE tables.
      Thankfully these are allocated already in 2 levels.
      
      This moves the table's last level allocation from the creating helper to
      kvmppc_tce_put() and kvm_spapr_tce_fault(). Since such allocation cannot
      be done in real mode, this creates a virtual mode version of
      kvmppc_tce_put() which handles allocations.
      
      This adds kvmppc_rm_ioba_validate() to do an additional test if
      the consequent kvmppc_tce_put() needs a page which has not been allocated;
      if this is the case, we bail out to virtual mode handlers.
      
      The allocations are protected by a new mutex as kvm->lock is not suitable
      for the task because the fault handler is called with the mmap_sem held
      but kvmhv_setup_mmu() locks kvm->lock and mmap_sem in the reverse order.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e1a1ef84
    • A
      KVM: PPC: Book3S HV: Avoid lockdep debugging in TCE realmode handlers · 2001825e
      Alexey Kardashevskiy 提交于
      The kvmppc_tce_to_ua() helper is called from real and virtual modes
      and it works fine as long as CONFIG_DEBUG_LOCKDEP is not enabled.
      However if the lockdep debugging is on, the lockdep will most likely break
      in kvm_memslots() because of srcu_dereference_check() so we need to use
      PPC-own kvm_memslots_raw() which uses realmode safe
      rcu_dereference_raw_notrace().
      
      This creates a realmode copy of kvmppc_tce_to_ua() which replaces
      kvm_memslots() with kvm_memslots_raw().
      
      Since kvmppc_rm_tce_to_ua() becomes static and can only be used inside
      HV KVM, this moves it earlier under CONFIG_KVM_BOOK3S_HV_POSSIBLE.
      
      This moves truly virtual-mode kvmppc_tce_to_ua() to where it belongs and
      drops the prmap parameter which was never used in the virtual mode.
      
      Fixes: d3695aa4 ("KVM: PPC: Add support for multiple-TCE hcalls", 2016-02-15)
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      2001825e
    • A
      KVM: PPC: Book3S HV: Fix lockdep warning when entering the guest · 3309bec8
      Alexey Kardashevskiy 提交于
      The trace_hardirqs_on() sets current->hardirqs_enabled and from here
      the lockdep assumes interrupts are enabled although they are remain
      disabled until the context switches to the guest. Consequent
      srcu_read_lock() checks the flags in rcu_lock_acquire(), observes
      disabled interrupts and prints a warning (see below).
      
      This moves trace_hardirqs_on/off closer to __kvmppc_vcore_entry to
      prevent lockdep from being confused.
      
      DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
      WARNING: CPU: 16 PID: 8038 at kernel/locking/lockdep.c:4128 check_flags.part.25+0x224/0x280
      [...]
      NIP [c000000000185b84] check_flags.part.25+0x224/0x280
      LR [c000000000185b80] check_flags.part.25+0x220/0x280
      Call Trace:
      [c000003fec253710] [c000000000185b80] check_flags.part.25+0x220/0x280 (unreliable)
      [c000003fec253780] [c000000000187ea4] lock_acquire+0x94/0x260
      [c000003fec253840] [c00800001a1e9768] kvmppc_run_core+0xa60/0x1ab0 [kvm_hv]
      [c000003fec253a10] [c00800001a1ed944] kvmppc_vcpu_run_hv+0x73c/0xec0 [kvm_hv]
      [c000003fec253ae0] [c00800001a1095dc] kvmppc_vcpu_run+0x34/0x48 [kvm]
      [c000003fec253b00] [c00800001a1056bc] kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
      [c000003fec253b90] [c00800001a0f3618] kvm_vcpu_ioctl+0x460/0x850 [kvm]
      [c000003fec253d00] [c00000000041c4f4] do_vfs_ioctl+0xe4/0x930
      [c000003fec253db0] [c00000000041ce04] ksys_ioctl+0xc4/0x110
      [c000003fec253e00] [c00000000041ce78] sys_ioctl+0x28/0x80
      [c000003fec253e20] [c00000000000b5a4] system_call+0x5c/0x70
      Instruction dump:
      419e0034 3d220004 39291730 81290000 2f890000 409e0020 3c82ffc6 3c62ffc5
      3884be70 386329c0 4bf6ea71 60000000 <0fe00000> 3c62ffc6 3863be90 4801273d
      irq event stamp: 1025
      hardirqs last  enabled at (1025): [<c00800001a1e9728>] kvmppc_run_core+0xa20/0x1ab0 [kvm_hv]
      hardirqs last disabled at (1024): [<c00800001a1e9358>] kvmppc_run_core+0x650/0x1ab0 [kvm_hv]
      softirqs last  enabled at (0): [<c0000000000f1210>] copy_process.isra.4.part.5+0x5f0/0x1d00
      softirqs last disabled at (0): [<0000000000000000>]           (null)
      ---[ end trace 31180adcc848993e ]---
      possible reason: unannotated irqs-off.
      irq event stamp: 1025
      hardirqs last  enabled at (1025): [<c00800001a1e9728>] kvmppc_run_core+0xa20/0x1ab0 [kvm_hv]
      hardirqs last disabled at (1024): [<c00800001a1e9358>] kvmppc_run_core+0x650/0x1ab0 [kvm_hv]
      softirqs last  enabled at (0): [<c0000000000f1210>] copy_process.isra.4.part.5+0x5f0/0x1d00
      softirqs last disabled at (0): [<0000000000000000>]           (null)
      
      Fixes: 8b24e69f ("KVM: PPC: Book3S HV: Close race with testing for signals on guest entry", 2017-06-26)
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      3309bec8
    • S
      KVM: PPC: Book3S HV: Implement real mode H_PAGE_INIT handler · eadfb1c5
      Suraj Jitindar Singh 提交于
      Implement a real mode handler for the H_CALL H_PAGE_INIT which can be
      used to zero or copy a guest page. The page is defined to be 4k and must
      be 4k aligned.
      
      The in-kernel real mode handler halves the time to handle this H_CALL
      compared to handling it in userspace for a hash guest.
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      eadfb1c5
    • S
      KVM: PPC: Book3S HV: Implement virtual mode H_PAGE_INIT handler · 2d34d1c3
      Suraj Jitindar Singh 提交于
      Implement a virtual mode handler for the H_CALL H_PAGE_INIT which can be
      used to zero or copy a guest page. The page is defined to be 4k and must
      be 4k aligned.
      
      The in-kernel handler halves the time to handle this H_CALL compared to
      handling it in userspace for a radix guest.
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      2d34d1c3
  2. 20 4月, 2019 1 次提交
    • M
      powerpc: Add force enable of DAWR on P9 option · c1fe190c
      Michael Neuling 提交于
      This adds a flag so that the DAWR can be enabled on P9 via:
        echo Y > /sys/kernel/debug/powerpc/dawr_enable_dangerous
      
      The DAWR was previously force disabled on POWER9 in:
        96541531 powerpc: Disable DAWR in the base POWER9 CPU features
      Also see Documentation/powerpc/DAWR-POWER9.txt
      
      This is a dangerous setting, USE AT YOUR OWN RISK.
      
      Some users may not care about a bad user crashing their box
      (ie. single user/desktop systems) and really want the DAWR.  This
      allows them to force enable DAWR.
      
      This flag can also be used to disable DAWR access. Once this is
      cleared, all DAWR access should be cleared immediately and your
      machine once again safe from crashing.
      
      Userspace may get confused by toggling this. If DAWR is force
      enabled/disabled between getting the number of breakpoints (via
      PTRACE_GETHWDBGINFO) and setting the breakpoint, userspace will get an
      inconsistent view of what's available. Similarly for guests.
      
      For the DAWR to be enabled in a KVM guest, the DAWR needs to be force
      enabled in the host AND the guest. For this reason, this won't work on
      POWERVM as it doesn't allow the HCALL to work. Writes of 'Y' to the
      dawr_enable_dangerous file will fail if the hypervisor doesn't support
      writing the DAWR.
      
      To double check the DAWR is working, run this kernel selftest:
        tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
      Any errors/failures/skips mean something is wrong.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c1fe190c
  3. 11 4月, 2019 1 次提交
  4. 05 4月, 2019 2 次提交
    • A
      KVM: PPC: Book3S: Protect memslots while validating user address · 345077c8
      Alexey Kardashevskiy 提交于
      Guest physical to user address translation uses KVM memslots and reading
      these requires holding the kvm->srcu lock. However recently introduced
      kvmppc_tce_validate() broke the rule (see the lockdep warning below).
      
      This moves srcu_read_lock(&vcpu->kvm->srcu) earlier to protect
      kvmppc_tce_validate() as well.
      
      =============================
      WARNING: suspicious RCU usage
      5.1.0-rc2-le_nv2_aikATfstn1-p1 #380 Not tainted
      -----------------------------
      include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      1 lock held by qemu-system-ppc/8020:
       #0: 0000000094972fe9 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0xdc/0x850 [kvm]
      
      stack backtrace:
      CPU: 44 PID: 8020 Comm: qemu-system-ppc Not tainted 5.1.0-rc2-le_nv2_aikATfstn1-p1 #380
      Call Trace:
      [c000003fece8f740] [c000000000bcc134] dump_stack+0xe8/0x164 (unreliable)
      [c000003fece8f790] [c000000000181be0] lockdep_rcu_suspicious+0x130/0x170
      [c000003fece8f810] [c0000000000d5f50] kvmppc_tce_to_ua+0x280/0x290
      [c000003fece8f870] [c00800001a7e2c78] kvmppc_tce_validate+0x80/0x1b0 [kvm]
      [c000003fece8f8e0] [c00800001a7e3fac] kvmppc_h_put_tce+0x94/0x3e4 [kvm]
      [c000003fece8f9a0] [c00800001a8baac4] kvmppc_pseries_do_hcall+0x30c/0xce0 [kvm_hv]
      [c000003fece8fa10] [c00800001a8bd89c] kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv]
      [c000003fece8fae0] [c00800001a7d95dc] kvmppc_vcpu_run+0x34/0x48 [kvm]
      [c000003fece8fb00] [c00800001a7d56bc] kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
      [c000003fece8fb90] [c00800001a7c3618] kvm_vcpu_ioctl+0x460/0x850 [kvm]
      [c000003fece8fd00] [c00000000041c4f4] do_vfs_ioctl+0xe4/0x930
      [c000003fece8fdb0] [c00000000041ce04] ksys_ioctl+0xc4/0x110
      [c000003fece8fe00] [c00000000041ce78] sys_ioctl+0x28/0x80
      [c000003fece8fe20] [c00000000000b5a4] system_call+0x5c/0x70
      
      Fixes: 42de7b9e ("KVM: PPC: Validate TCEs against preregistered memory page sizes", 2018-09-10)
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      345077c8
    • S
      KVM: PPC: Book3S HV: Perserve PSSCR FAKE_SUSPEND bit on guest exit · 7cb9eb10
      Suraj Jitindar Singh 提交于
      There is a hardware bug in some POWER9 processors where a treclaim in
      fake suspend mode can cause an inconsistency in the XER[SO] bit across
      the threads of a core, the workaround being to force the core into SMT4
      when doing the treclaim.
      
      The FAKE_SUSPEND bit (bit 10) in the PSSCR is used to control whether a
      thread is in fake suspend or real suspend. The important difference here
      being that thread reconfiguration is blocked in real suspend but not
      fake suspend mode.
      
      When we exit a guest which was in fake suspend mode, we force the core
      into SMT4 while we do the treclaim in kvmppc_save_tm_hv().
      However on the new exit path introduced with the function
      kvmhv_run_single_vcpu() we restore the host PSSCR before calling
      kvmppc_save_tm_hv() which means that if we were in fake suspend mode we
      put the thread into real suspend mode when we clear the
      PSSCR[FAKE_SUSPEND] bit. This means that we block thread reconfiguration
      and the thread which is trying to get the core into SMT4 before it can
      do the treclaim spins forever since it itself is blocking thread
      reconfiguration. The result is that that core is essentially lost.
      
      This results in a trace such as:
      [   93.512904] CPU: 7 PID: 13352 Comm: qemu-system-ppc Not tainted 5.0.0 #4
      [   93.512905] NIP:  c000000000098a04 LR: c0000000000cc59c CTR: 0000000000000000
      [   93.512908] REGS: c000003fffd2bd70 TRAP: 0100   Not tainted  (5.0.0)
      [   93.512908] MSR:  9000000302883033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE,TM[SE]>  CR: 22222444  XER: 00000000
      [   93.512914] CFAR: c000000000098a5c IRQMASK: 3
      [   93.512915] PACATMSCRATCH: 0000000000000001
      [   93.512916] GPR00: 0000000000000001 c000003f6cc1b830 c000000001033100 0000000000000004
      [   93.512928] GPR04: 0000000000000004 0000000000000002 0000000000000004 0000000000000007
      [   93.512930] GPR08: 0000000000000000 0000000000000004 0000000000000000 0000000000000004
      [   93.512932] GPR12: c000203fff7fc000 c000003fffff9500 0000000000000000 0000000000000000
      [   93.512935] GPR16: 2000000000300375 000000000000059f 0000000000000000 0000000000000000
      [   93.512951] GPR20: 0000000000000000 0000000000080053 004000000256f41f c000003f6aa88ef0
      [   93.512953] GPR24: c000003f6aa89100 0000000000000010 0000000000000000 0000000000000000
      [   93.512956] GPR28: c000003f9e9a0800 0000000000000000 0000000000000001 c000203fff7fc000
      [   93.512959] NIP [c000000000098a04] pnv_power9_force_smt4_catch+0x1b4/0x2c0
      [   93.512960] LR [c0000000000cc59c] kvmppc_save_tm_hv+0x40/0x88
      [   93.512960] Call Trace:
      [   93.512961] [c000003f6cc1b830] [0000000000080053] 0x80053 (unreliable)
      [   93.512965] [c000003f6cc1b8a0] [c00800001e9cb030] kvmhv_p9_guest_entry+0x508/0x6b0 [kvm_hv]
      [   93.512967] [c000003f6cc1b940] [c00800001e9cba44] kvmhv_run_single_vcpu+0x2dc/0xb90 [kvm_hv]
      [   93.512968] [c000003f6cc1ba10] [c00800001e9cc948] kvmppc_vcpu_run_hv+0x650/0xb90 [kvm_hv]
      [   93.512969] [c000003f6cc1bae0] [c00800001e8f620c] kvmppc_vcpu_run+0x34/0x48 [kvm]
      [   93.512971] [c000003f6cc1bb00] [c00800001e8f2d4c] kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
      [   93.512972] [c000003f6cc1bb90] [c00800001e8e3918] kvm_vcpu_ioctl+0x460/0x7d0 [kvm]
      [   93.512974] [c000003f6cc1bd00] [c0000000003ae2c0] do_vfs_ioctl+0xe0/0x8e0
      [   93.512975] [c000003f6cc1bdb0] [c0000000003aeb24] ksys_ioctl+0x64/0xe0
      [   93.512978] [c000003f6cc1be00] [c0000000003aebc8] sys_ioctl+0x28/0x80
      [   93.512981] [c000003f6cc1be20] [c00000000000b3a4] system_call+0x5c/0x70
      [   93.512983] Instruction dump:
      [   93.512986] 419dffbc e98c0000 2e8b0000 38000001 60000000 60000000 60000000 40950068
      [   93.512993] 392bffff 39400000 79290020 39290001 <7d2903a6> 60000000 60000000 7d235214
      
      To fix this we preserve the PSSCR[FAKE_SUSPEND] bit until we call
      kvmppc_save_tm_hv() which will mean the core can get into SMT4 and
      perform the treclaim. Note kvmppc_save_tm_hv() clears the
      PSSCR[FAKE_SUSPEND] bit again so there is no need to explicitly do that.
      
      Fixes: 95a6432c ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7cb9eb10
  5. 21 3月, 2019 1 次提交
    • M
      powerpc/security: Fix spectre_v2 reporting · 92edf8df
      Michael Ellerman 提交于
      When I updated the spectre_v2 reporting to handle software count cache
      flush I got the logic wrong when there's no software count cache
      enabled at all.
      
      The result is that on systems with the software count cache flush
      disabled we print:
      
        Mitigation: Indirect branch cache disabled, Software count cache flush
      
      Which correctly indicates that the count cache is disabled, but
      incorrectly says the software count cache flush is enabled.
      
      The root of the problem is that we are trying to handle all
      combinations of options. But we know now that we only expect to see
      the software count cache flush enabled if the other options are false.
      
      So split the two cases, which simplifies the logic and fixes the bug.
      We were also missing a space before "(hardware accelerated)".
      
      The result is we see one of:
      
        Mitigation: Indirect branch serialisation (kernel only)
        Mitigation: Indirect branch cache disabled
        Mitigation: Software count cache flush
        Mitigation: Software count cache flush (hardware accelerated)
      
      Fixes: ee13cb24 ("powerpc/64s: Add support for software count cache flush")
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NMichael Neuling <mikey@neuling.org>
      Reviewed-by: NDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      92edf8df
  6. 20 3月, 2019 1 次提交
    • B
      powerpc/mm: Only define MAX_PHYSMEM_BITS in SPARSEMEM configurations · 8bc08689
      Ben Hutchings 提交于
      MAX_PHYSMEM_BITS only needs to be defined if CONFIG_SPARSEMEM is
      enabled, and that was the case before commit 4ffe713b
      ("powerpc/mm: Increase the max addressable memory to 2PB").
      
      On 32-bit systems, where CONFIG_SPARSEMEM is not enabled, we now
      define it as 46.  That is larger than the real number of physical
      address bits, and breaks calculations in zsmalloc:
      
        mm/zsmalloc.c:130:49: warning: right shift count is negative
          MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
                                                         ^~
        ...
        mm/zsmalloc.c:253:21: error: variably modified 'size_class' at file scope
          struct size_class *size_class[ZS_SIZE_CLASSES];
                             ^~~~~~~~~~
      
      Fixes: 4ffe713b ("powerpc/mm: Increase the max addressable memory to 2PB")
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8bc08689
  7. 18 3月, 2019 2 次提交
    • C
      powerpc/6xx: fix setup and use of SPRN_SPRG_PGDIR for hash32 · 4622a2d4
      Christophe Leroy 提交于
      Not only the 603 but all 6xx need SPRN_SPRG_PGDIR to be initialised at
      startup. This patch move it from __setup_cpu_603() to start_here()
      and __secondary_start(), close to the initialisation of SPRN_THREAD.
      
      Previously, virt addr of PGDIR was retrieved from thread struct.
      Now that it is the phys addr which is stored in SPRN_SPRG_PGDIR,
      hash_page() shall not convert it to phys anymore.
      This patch removes the conversion.
      
      Fixes: 93c4a162 ("powerpc/6xx: Store PGDIR physical address in a SPRG")
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4622a2d4
    • M
      powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038 · b5b4453e
      Michael Ellerman 提交于
      Jakub Drnec reported:
        Setting the realtime clock can sometimes make the monotonic clock go
        back by over a hundred years. Decreasing the realtime clock across
        the y2k38 threshold is one reliable way to reproduce. Allegedly this
        can also happen just by running ntpd, I have not managed to
        reproduce that other than booting with rtc at >2038 and then running
        ntp. When this happens, anything with timers (e.g. openjdk) breaks
        rather badly.
      
      And included a test case (slightly edited for brevity):
        #define _POSIX_C_SOURCE 199309L
        #include <stdio.h>
        #include <time.h>
        #include <stdlib.h>
        #include <unistd.h>
      
        long get_time(void) {
          struct timespec tp;
          clock_gettime(CLOCK_MONOTONIC, &tp);
          return tp.tv_sec + tp.tv_nsec / 1000000000;
        }
      
        int main(void) {
          long last = get_time();
          while(1) {
            long now = get_time();
            if (now < last) {
              printf("clock went backwards by %ld seconds!\n", last - now);
            }
            last = now;
            sleep(1);
          }
          return 0;
        }
      
      Which when run concurrently with:
       # date -s 2040-1-1
       # date -s 2037-1-1
      
      Will detect the clock going backward.
      
      The root cause is that wtom_clock_sec in struct vdso_data is only a
      32-bit signed value, even though we set its value to be equal to
      tk->wall_to_monotonic.tv_sec which is 64-bits.
      
      Because the monotonic clock starts at zero when the system boots the
      wall_to_montonic.tv_sec offset is negative for current and future
      dates. Currently on a freshly booted system the offset will be in the
      vicinity of negative 1.5 billion seconds.
      
      However if the wall clock is set past the Y2038 boundary, the offset
      from wall to monotonic becomes less than negative 2^31, and no longer
      fits in 32-bits. When that value is assigned to wtom_clock_sec it is
      truncated and becomes positive, causing the VDSO assembly code to
      calculate CLOCK_MONOTONIC incorrectly.
      
      That causes CLOCK_MONOTONIC to jump ahead by ~4 billion seconds which
      it is not meant to do. Worse, if the time is then set back before the
      Y2038 boundary CLOCK_MONOTONIC will jump backward.
      
      We can fix it simply by storing the full 64-bit offset in the
      vdso_data, and using that in the VDSO assembly code. We also shuffle
      some of the fields in vdso_data to avoid creating a hole.
      
      The original commit that added the CLOCK_MONOTONIC support to the VDSO
      did actually use a 64-bit value for wtom_clock_sec, see commit
      a7f290da ("[PATCH] powerpc: Merge vdso's and add vdso support to
      32 bits kernel") (Nov 2005). However just 3 days later it was
      converted to 32-bits in commit 0c37ec2a ("[PATCH] powerpc: vdso
      fixes (take #2)"), and the bug has existed since then AFAICS.
      
      Fixes: 0c37ec2a ("[PATCH] powerpc: vdso fixes (take #2)")
      Cc: stable@vger.kernel.org # v2.6.15+
      Link: http://lkml.kernel.org/r/HaC.ZfES.62bwlnvAvMP.1STMMj@seznam.czReported-by: NJakub Drnec <jaydee@email.cz>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b5b4453e
  8. 17 3月, 2019 2 次提交
  9. 13 3月, 2019 7 次提交
    • M
      powerpc/64s: Include <asm/nmi.h> header file to fix a warning · de3c83c2
      Mathieu Malaterre 提交于
      Make sure to include <asm/nmi.h> to provide the following prototype:
      hv_nmi_check_nonrecoverable.
      
      Remove the following warning treated as error (W=1):
      
        arch/powerpc/kernel/traps.c:393:6: error: no previous prototype for 'hv_nmi_check_nonrecoverable'
      
      Fixes: ccd47702 ("powerpc/64s: Fix HV NMI vs HV interrupt recoverability test")
      Signed-off-by: NMathieu Malaterre <malat@debian.org>
      Reviewed-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      de3c83c2
    • A
      powerpc/powernv: Fix compile without CONFIG_TRACEPOINTS · 17028776
      Alexey Kardashevskiy 提交于
      The functions returns s64 but the return statement is missing.
      This adds the missing return statement.
      
      Fixes: 75d9fc7f ("powerpc/powernv: move OPAL call wrapper tracing and interrupt handling to C")
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      17028776
    • M
      treewide: add checks for the return value of memblock_alloc*() · 8a7f97b9
      Mike Rapoport 提交于
      Add check for the return value of memblock_alloc*() functions and call
      panic() in case of error.  The panic message repeats the one used by
      panicing memblock allocators with adjustment of parameters to include
      only relevant ones.
      
      The replacement was mostly automated with semantic patches like the one
      below with manual massaging of format strings.
      
        @@
        expression ptr, size, align;
        @@
        ptr = memblock_alloc(size, align);
        + if (!ptr)
        + 	panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, size, align);
      
      [anders.roxell@linaro.org: use '%pa' with 'phys_addr_t' type]
        Link: http://lkml.kernel.org/r/20190131161046.21886-1-anders.roxell@linaro.org
      [rppt@linux.ibm.com: fix format strings for panics after memblock_alloc]
        Link: http://lkml.kernel.org/r/1548950940-15145-1-git-send-email-rppt@linux.ibm.com
      [rppt@linux.ibm.com: don't panic if the allocation in sparse_buffer_init fails]
        Link: http://lkml.kernel.org/r/20190131074018.GD28876@rapoport-lnx
      [akpm@linux-foundation.org: fix xtensa printk warning]
      Link: http://lkml.kernel.org/r/1548057848-15136-20-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAnders Roxell <anders.roxell@linaro.org>
      Reviewed-by: Guo Ren <ren_guo@c-sky.com>		[c-sky]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>	[s390]
      Reviewed-by: Juergen Gross <jgross@suse.com>		[Xen]
      Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Acked-by: Max Filippov <jcmvbkbc@gmail.com>		[xtensa]
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a7f97b9
    • M
      memblock: drop memblock_alloc_base() · 0ba9e6ed
      Mike Rapoport 提交于
      The memblock_alloc_base() function tries to allocate a memory up to the
      limit specified by its max_addr parameter and panics if the allocation
      fails.  Replace its usage with memblock_phys_alloc_range() and make the
      callers check the return value and panic in case of error.
      
      Link: http://lkml.kernel.org/r/1548057848-15136-10-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>			[Xen]
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ba9e6ed
    • M
      memblock: memblock_phys_alloc(): don't panic · ecc3e771
      Mike Rapoport 提交于
      Make the memblock_phys_alloc() function an inline wrapper for
      memblock_phys_alloc_range() and update the memblock_phys_alloc() callers
      to check the returned value and panic in case of error.
      
      Link: http://lkml.kernel.org/r/1548057848-15136-8-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>			[Xen]
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ecc3e771
    • M
      memblock: memblock_phys_alloc_try_nid(): don't panic · 33755574
      Mike Rapoport 提交于
      The memblock_phys_alloc_try_nid() function tries to allocate memory from
      the requested node and then falls back to allocation from any node in
      the system.  The memblock_alloc_base() fallback used by this function
      panics if the allocation fails.
      
      Replace the memblock_alloc_base() fallback with the direct call to
      memblock_alloc_range_nid() and update the memblock_phys_alloc_try_nid()
      callers to check the returned value and panic in case of error.
      
      Link: http://lkml.kernel.org/r/1548057848-15136-7-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>			[Xen]
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33755574
    • C
      powerpc: use memblock functions returning virtual address · 1269f7b8
      Christophe Leroy 提交于
      Since only the virtual address of allocated blocks is used, lets use
      functions returning directly virtual address.
      
      Those functions have the advantage of also zeroing the block.
      
      [rppt@linux.ibm.com: powerpc: remove duplicated alloc_stack() function]
        Link: http://lkml.kernel.org/r/20190226064032.GA5873@rapoport-lnx
      [rppt@linux.ibm.com: updated error message in alloc_stack() to be more verbose]
      [rppt@linux.ibm.com: convereted several additional call sites ]
      Link: http://lkml.kernel.org/r/1548057848-15136-3-git-send-email-rppt@linux.ibm.comSigned-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>			[Xen]
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1269f7b8