1. 22 12月, 2018 1 次提交
  2. 26 11月, 2018 1 次提交
  3. 19 9月, 2018 2 次提交
    • M
      powerpc/pseries: Dump the SLB contents on SLB MCE errors. · c6d15258
      Mahesh Salgaonkar 提交于
      If we get a machine check exceptions due to SLB errors then dump the
      current SLB contents which will be very much helpful in debugging the
      root cause of SLB errors. Introduce an exclusive buffer per cpu to hold
      faulty SLB entries. In real mode mce handler saves the old SLB contents
      into this buffer accessible through paca and print it out later in virtual
      mode.
      
      With this patch the console will log SLB contents like below on SLB MCE
      errors:
      
      [  507.297236] SLB contents of cpu 0x1
      [  507.297237] Last SLB entry inserted at slot 16
      [  507.297238] 00 c000000008000000 400ea1b217000500
      [  507.297239]   1T  ESID=   c00000  VSID=      ea1b217 LLP:100
      [  507.297240] 01 d000000008000000 400d43642f000510
      [  507.297242]   1T  ESID=   d00000  VSID=      d43642f LLP:110
      [  507.297243] 11 f000000008000000 400a86c85f000500
      [  507.297244]   1T  ESID=   f00000  VSID=      a86c85f LLP:100
      [  507.297245] 12 00007f0008000000 4008119624000d90
      [  507.297246]   1T  ESID=       7f  VSID=      8119624 LLP:110
      [  507.297247] 13 0000000018000000 00092885f5150d90
      [  507.297247]  256M ESID=        1  VSID=   92885f5150 LLP:110
      [  507.297248] 14 0000010008000000 4009e7cb50000d90
      [  507.297249]   1T  ESID=        1  VSID=      9e7cb50 LLP:110
      [  507.297250] 15 d000000008000000 400d43642f000510
      [  507.297251]   1T  ESID=   d00000  VSID=      d43642f LLP:110
      [  507.297252] 16 d000000008000000 400d43642f000510
      [  507.297253]   1T  ESID=   d00000  VSID=      d43642f LLP:110
      [  507.297253] ----------------------------------
      [  507.297254] SLB cache ptr value = 3
      [  507.297254] Valid SLB cache entries:
      [  507.297255] 00 EA[0-35]=    7f000
      [  507.297256] 01 EA[0-35]=        1
      [  507.297257] 02 EA[0-35]=     1000
      [  507.297257] Rest of SLB cache entries:
      [  507.297258] 03 EA[0-35]=    7f000
      [  507.297258] 04 EA[0-35]=        1
      [  507.297259] 05 EA[0-35]=     1000
      [  507.297260] 06 EA[0-35]=       12
      [  507.297260] 07 EA[0-35]=    7f000
      Suggested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c6d15258
    • M
      powerpc/pseries: Flush SLB contents on SLB MCE errors. · a43c1590
      Mahesh Salgaonkar 提交于
      On pseries, as of today system crashes if we get a machine check
      exceptions due to SLB errors. These are soft errors and can be fixed
      by flushing the SLBs so the kernel can continue to function instead of
      system crash. We do this in real mode before turning on MMU. Otherwise
      we would run into nested machine checks. This patch now fetches the
      rtas error log in real mode and flushes the SLBs on SLB/ERAT errors.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichal Suchanek <msuchanek@suse.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a43c1590
  4. 07 8月, 2018 3 次提交
    • M
      powerpc/pseries: Query hypervisor for count cache flush settings · ba72dc17
      Michael Ellerman 提交于
      Use the existing hypercall to determine the appropriate settings for
      the count cache flush, and then call the generic powerpc code to set
      it up based on the security feature flags.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ba72dc17
    • M
      powerpc/64: Call setup_barrier_nospec() from setup_arch() · af375eef
      Michael Ellerman 提交于
      Currently we require platform code to call setup_barrier_nospec(). But
      if we add an empty definition for the !CONFIG_PPC_BARRIER_NOSPEC case
      then we can call it in setup_arch().
      Signed-off-by: NDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      af375eef
    • M
      powerpc/pseries: Defer the logging of rtas error to irq work queue. · 94675cce
      Mahesh Salgaonkar 提交于
      rtas_log_buf is a buffer to hold RTAS event data that are communicated
      to kernel by hypervisor. This buffer is then used to pass RTAS event
      data to user through proc fs. This buffer is allocated from
      vmalloc (non-linear mapping) area.
      
      On Machine check interrupt, register r3 points to RTAS extended event
      log passed by hypervisor that contains the MCE event. The pseries
      machine check handler then logs this error into rtas_log_buf. The
      rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a
      page fault (vector 0x300) while accessing it. Since machine check
      interrupt handler runs in NMI context we can not afford to take any
      page fault. Page faults are not honored in NMI context and causes
      kernel panic. Apart from that, as Nick pointed out,
      pSeries_log_error() also takes a spin_lock while logging error which
      is not safe in NMI context. It may endup in deadlock if we get another
      MCE before releasing the lock. Fix this by deferring the logging of
      rtas error to irq work queue.
      
      Current implementation uses two different buffers to hold rtas error
      log depending on whether extended log is provided or not. This makes
      bit difficult to identify which buffer has valid data that needs to
      logged later in irq work. Simplify this using single buffer, one per
      paca, and copy rtas log to it irrespective of whether extended log is
      provided or not. Allocate this buffer below RMA region so that it can
      be accessed in real mode mce handler.
      
      Fixes: b96672dd ("powerpc: Machine check interrupt is a non-maskable interrupt")
      Cc: stable@vger.kernel.org # v4.14+
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      94675cce
  5. 01 8月, 2018 1 次提交
    • H
      PCI: Fix is_added/is_busmaster race condition · 44bda4b7
      Hari Vyas 提交于
      When a PCI device is detected, pdev->is_added is set to 1 and proc and
      sysfs entries are created.
      
      When the device is removed, pdev->is_added is checked for one and then
      device is detached with clearing of proc and sys entries and at end,
      pdev->is_added is set to 0.
      
      is_added and is_busmaster are bit fields in pci_dev structure sharing same
      memory location.
      
      A strange issue was observed with multiple removal and rescan of a PCIe
      NVMe device using sysfs commands where is_added flag was observed as zero
      instead of one while removing device and proc,sys entries are not cleared.
      This causes issue in later device addition with warning message
      "proc_dir_entry" already registered.
      
      Debugging revealed a race condition between the PCI core setting the
      is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue
      setting the is_busmaster bit in pci_set_master().  As these fields are not
      handled atomically, that clears the is_added bit.
      
      Move the is_added bit to a separate private flag variable and use atomic
      functions to set and retrieve the device addition state.  This avoids the
      race because is_added no longer shares a memory location with is_busmaster.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283Signed-off-by: NHari Vyas <hari.vyas@broadcom.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NLukas Wunner <lukas@wunner.de>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      44bda4b7
  6. 31 7月, 2018 1 次提交
    • S
      powerpc/pseries: fix EEH recovery of some IOV devices · b87b9cf4
      Sam Bobroff 提交于
      EEH recovery currently fails on pSeries for some IOV capable PCI
      devices, if CONFIG_PCI_IOV is on and the hypervisor doesn't provide
      certain device tree properties for the device. (Found on an IOV
      capable device using the ipr driver.)
      
      Recovery fails in pci_enable_resources() at the check on r->parent,
      because r->flags is set and r->parent is not.  This state is due to
      sriov_init() setting the start, end and flags members of the IOV BARs
      but the parent not being set later in
      pseries_pci_fixup_iov_resources(), because the
      "ibm,open-sriov-vf-bar-info" property is missing.
      
      Correct this by zeroing the resource flags for IOV BARs when they
      can't be configured (this is the same method used by sriov_init() and
      __pci_read_base()).
      
      VFs cleared this way can't be enabled later, because that requires
      another device tree property, "ibm,number-of-configurable-vfs" as well
      as support for the RTAS function "ibm_map_pes". These are all part of
      hypervisor support for IOV and it seems unlikely that a hypervisor
      would ever partially, but not fully, support it. (None are currently
      provided by QEMU/KVM.)
      Signed-off-by: NSam Bobroff <sbobroff@linux.ibm.com>
      Reviewed-by: NBryant G. Ly <bryantly@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b87b9cf4
  7. 30 7月, 2018 1 次提交
  8. 03 6月, 2018 1 次提交
  9. 22 5月, 2018 1 次提交
  10. 03 4月, 2018 1 次提交
    • M
      powerpc/pseries: Restore default security feature flags on setup · 6232774f
      Mauricio Faria de Oliveira 提交于
      After migration the security feature flags might have changed (e.g.,
      destination system with unpatched firmware), but some flags are not
      set/clear again in init_cpu_char_feature_flags() because it assumes
      the security flags to be the defaults.
      
      Additionally, if the H_GET_CPU_CHARACTERISTICS hypercall fails then
      init_cpu_char_feature_flags() does not run again, which potentially
      might leave the system in an insecure or sub-optimal configuration.
      
      So, just restore the security feature flags to the defaults assumed
      by init_cpu_char_feature_flags() so it can set/clear them correctly,
      and to ensure safe settings are in place in case the hypercall fail.
      
      Fixes: f636c147 ("powerpc/pseries: Set or clear security feature flags")
      Depends-on: 19887d6a28e2 ("powerpc: Move default security feature flags")
      Signed-off-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6232774f
  11. 30 3月, 2018 2 次提交
    • M
      powerpc/pseries: Fix clearing of security feature flags · 0f9bdfe3
      Mauricio Faria de Oliveira 提交于
      The H_CPU_BEHAV_* flags should be checked for in the 'behaviour' field
      of 'struct h_cpu_char_result' -- 'character' is for H_CPU_CHAR_*
      flags.
      
      Found by playing around with QEMU's implementation of the hypercall:
      
        H_CPU_CHAR=0xf000000000000000
        H_CPU_BEHAV=0x0000000000000000
      
        This clears H_CPU_BEHAV_FAVOUR_SECURITY and H_CPU_BEHAV_L1D_FLUSH_PR
        so pseries_setup_rfi_flush() disables 'rfi_flush'; and it also
        clears H_CPU_CHAR_L1D_THREAD_PRIV flag. So there is no RFI flush
        mitigation at all for cpu_show_meltdown() to report; but currently
        it does:
      
        Original kernel:
      
          # cat /sys/devices/system/cpu/vulnerabilities/meltdown
          Mitigation: RFI Flush
      
        Patched kernel:
      
          # cat /sys/devices/system/cpu/vulnerabilities/meltdown
          Not affected
      
        H_CPU_CHAR=0x0000000000000000
        H_CPU_BEHAV=0xf000000000000000
      
        This sets H_CPU_BEHAV_BNDS_CHK_SPEC_BAR so cpu_show_spectre_v1() should
        report vulnerable; but currently it doesn't:
      
        Original kernel:
      
          # cat /sys/devices/system/cpu/vulnerabilities/spectre_v1
          Not affected
      
        Patched kernel:
      
          # cat /sys/devices/system/cpu/vulnerabilities/spectre_v1
          Vulnerable
      Brown-paper-bag-by: NMichael Ellerman <mpe@ellerman.id.au>
      Fixes: f636c147 ("powerpc/pseries: Set or clear security feature flags")
      Signed-off-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0f9bdfe3
    • N
      powerpc/64: Use array of paca pointers and allocate pacas individually · d2e60075
      Nicholas Piggin 提交于
      Change the paca array into an array of pointers to pacas. Allocate
      pacas individually.
      
      This allows flexibility in where the PACAs are allocated. Future work
      will allocate them node-local. Platforms that don't have address limits
      on PACAs would be able to defer PACA allocations until later in boot
      rather than allocate all possible ones up-front then freeing unused.
      
      This is slightly more overhead (one additional indirection) for cross
      CPU paca references, but those aren't too common.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d2e60075
  12. 27 3月, 2018 4 次提交
  13. 13 3月, 2018 1 次提交
    • M
      powerpc: Rename plapr routines to plpar · 7c09c186
      Michael Ellerman 提交于
      Back in 2013 we added some hypercall wrappers which misspelled
      "plpar" (P-series Logical PARtition) as "plapr".
      
      Visually they're hard to distinguish and it almost doesn't matter, but
      it is confusing when grepping to miss some calls because of the typo.
      
      They've also started spreading, so before they take over let's fix
      them all to be "plpar".
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7c09c186
  14. 23 2月, 2018 1 次提交
  15. 27 1月, 2018 1 次提交
  16. 22 1月, 2018 1 次提交
    • N
      powerpc/pseries, ps3: panic flush kernel messages before halting system · 35adacd6
      Nicholas Piggin 提交于
      Platforms with a panic handler that halts the system can have problems
      getting kernel messages out, because the panic notifiers are called
      before kernel/panic.c does its flushing of printk buffers an console
      etc.
      
      This was attempted to be solved with commit a3b2cb30 ("powerpc: Do
      not call ppc_md.panic in fadump panic notifier"), but that wasn't the
      right approach and caused other problems, and was reverted by commit
      ab9dbf77.
      
      Instead, the powernv shutdown paths have already had a similar
      problem, fixed by taking the message flushing sequence from
      kernel/panic.c. That's a little bit ugly, but while we have the code
      duplicated, it will work for this case as well. So have ppc panic
      handlers do the same flushing before they terminate.
      
      Without this patch, a qemu pseries_le_defconfig guest stops silently
      when issued the nmi command when xmon is off and no crash dumpers
      enabled. Afterwards, an oops is printed by each CPU as expected.
      
      Fixes: ab9dbf77 ("Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier"")
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      35adacd6
  17. 10 1月, 2018 1 次提交
  18. 05 12月, 2017 1 次提交
    • D
      Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier" · ab9dbf77
      David Gibson 提交于
      This reverts commit a3b2cb30.
      
      That commit tried to fix problems with panic on powerpc in certain
      circumstances, where some output from the generic panic code was being
      dropped.
      
      Unfortunately, it breaks things worse in other circumstances. In
      particular when running a PAPR guest, it will now attempt to reboot
      instead of informing the hypervisor (KVM or PowerVM) that the guest
      has crashed. The crash notification is important to some
      virtualization management layers.
      
      Revert it for now until we can come up with a better solution.
      
      Fixes: a3b2cb30 ("powerpc: Do not call ppc_md.panic in fadump panic notifier")
      Cc: stable@vger.kernel.org # v4.14+
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      [mpe: Tweak change log a bit]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ab9dbf77
  19. 04 12月, 2017 1 次提交
  20. 02 9月, 2017 1 次提交
    • C
      powerpc/xive: guest exploitation of the XIVE interrupt controller · eac1e731
      Cédric Le Goater 提交于
      This is the framework for using XIVE in a PowerVM guest. The support
      is very similar to the native one in a much simpler form.
      
      Each source is associated with an Event State Buffer (ESB). This is a
      two bit state machine which is used to trigger events. The bits are
      named "P" (pending) and "Q" (queued) and can be controlled by MMIO.
      The Guest OS registers event (or notifications) queues on which the HW
      will post event data for a target to notify.
      
      Instead of OPAL calls, a set of Hypervisors call are used to configure
      the interrupt sources and the event/notification queues of the guest:
      
       - H_INT_GET_SOURCE_INFO
      
         used to obtain the address of the MMIO page of the Event State
         Buffer (PQ bits) entry associated with the source.
      
       - H_INT_SET_SOURCE_CONFIG
      
         assigns a source to a "target".
      
       - H_INT_GET_SOURCE_CONFIG
      
         determines to which "target" and "priority" is assigned to a source
      
       - H_INT_GET_QUEUE_INFO
      
         returns the address of the notification management page associated
         with the specified "target" and "priority".
      
       - H_INT_SET_QUEUE_CONFIG
      
         sets or resets the event queue for a given "target" and "priority".
         It is also used to set the notification config associated with the
         queue, only unconditional notification for the moment.  Reset is
         performed with a queue size of 0 and queueing is disabled in that
         case.
      
       - H_INT_GET_QUEUE_CONFIG
      
         returns the queue settings for a given "target" and "priority".
      
       - H_INT_RESET
      
         resets all of the partition's interrupt exploitation structures to
         their initial state, losing all configuration set via the hcalls
         H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
      
       - H_INT_SYNC
      
         issue a synchronisation on a source to make sure sure all
         notifications have reached their queue.
      
      As for XICS, the XIVE interface for the guest is described in the
      device tree under the "interrupt-controller" node. A couple of new
      properties are specific to XIVE :
      
       - "reg"
      
         contains the base address and size of the thread interrupt
         managnement areas (TIMA), also called rings, for the User level and
         for the Guest OS level. Only the Guest OS level is taken into
         account today.
      
       - "ibm,xive-eq-sizes"
      
         the size of the event queues. One cell per size supported, contains
         log2 of size, in ascending order.
      
       - "ibm,xive-lisn-ranges"
      
         the interrupt numbers ranges assigned to the guest. These are
         allocated using a simple bitmap.
      
      and also :
      
       - "/ibm,plat-res-int-priorities"
      
         contains a list of priorities that the hypervisor has reserved for
         its own use.
      
      Tested with a QEMU XIVE model for pseries and with the Power hypervisor.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      eac1e731
  21. 31 8月, 2017 1 次提交
  22. 31 3月, 2017 1 次提交
  23. 31 1月, 2017 1 次提交
  24. 30 11月, 2016 1 次提交
  25. 20 9月, 2016 1 次提交
    • M
      powerpc: Remove all usages of NO_IRQ · ef24ba70
      Michael Ellerman 提交于
      NO_IRQ has been == 0 on powerpc for just over ten years (since commit
      0ebfff14 ("[POWERPC] Add new interrupt mapping core and change
      platforms to use it")). It's also 0 on most other arches.
      
      Although it's fairly harmless, every now and then it causes confusion
      when a driver is built on powerpc and another arch which doesn't define
      NO_IRQ. There's at least 6 definitions of NO_IRQ in drivers/, at least
      some of which are to work around that problem.
      
      So we'd like to remove it. This is fairly trivial in the arch code, we
      just convert:
      
          if (irq == NO_IRQ)	to	if (!irq)
          if (irq != NO_IRQ)	to	if (irq)
          irq = NO_IRQ;	to	irq = 0;
          return NO_IRQ;	to	return 0;
      
      And a few other odd cases as well.
      
      At least for now we keep the #define NO_IRQ, because there is driver
      code that uses NO_IRQ and the fixes to remove those will go via other
      trees.
      
      Note we also change some occurrences in PPC sound drivers, drivers/ps3,
      and drivers/macintosh.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ef24ba70
  26. 06 9月, 2016 1 次提交
    • T
      powerpc/pseries: Fix little endian build with CONFIG_KEXEC=n · d81d8258
      Thiago Jung Bauermann 提交于
      On ppc64le, builds with CONFIG_KEXEC=n fail with:
      
      arch/powerpc/platforms/pseries/setup.c: In function ‘pseries_big_endian_exceptions’:
      arch/powerpc/platforms/pseries/setup.c:403:13: error: implicit declaration of function ‘kdump_in_progress’
        if (rc && !kdump_in_progress())
      
      This is because pseries/setup.c includes <linux/kexec.h>, but
      kdump_in_progress() is defined in <asm/kexec.h>. This is a problem
      because the former only includes the latter if CONFIG_KEXEC_CORE=y.
      
      Fix it by including <asm/kexec.h> directly, as is done in powernv/setup.c.
      
      Fixes: d3cbff1b ("powerpc: Put exception configuration in a common place")
      Signed-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d81d8258
  27. 21 7月, 2016 5 次提交
  28. 21 6月, 2016 1 次提交
    • G
      powerpc/pci: Delay populating pdn · 8cc7581c
      Gavin Shan 提交于
      The pdn (struct pci_dn) instances are allocated from memblock or
      bootmem when creating PCI controller (hoses) in setup_arch(). PCI
      hotplug, which will be supported by proceeding patches, releases
      PCI device nodes and their corresponding pdn on unplugging event.
      The memory chunks for pdn instances allocated from memblock or
      bootmem are hard to reused after being released.
      
      This delays creating pdn by pci_devs_phb_init() from setup_arch()
      to core_initcall() so that they are allocated from slab. The memory
      consumed by pdn can be released to system without problem during
      PCI unplugging time. It indicates that pci_dn is unavailable in
      setup_arch() and the the fixup on pdn (like AGP's) can't be carried
      out that time. We have to do that in pcibios_root_bridge_prepare()
      on maple/pasemi/powermac platforms where/when the pdn is available.
      pcibios_root_bridge_prepare is called from subsys_initcall() which
      is executed after core_initcall() so the code flow does not change.
      
      At the mean while, the EEH device is created when pdn is populated,
      meaning pdn and EEH device have same life cycle. In turn, we needn't
      call eeh_dev_init() to create EEH device explicitly.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8cc7581c
  29. 14 6月, 2016 1 次提交