1. 20 8月, 2020 2 次提交
    • V
      powerpc/pseries: Do not initiate shutdown when system is running on UPS · 90a9b102
      Vasant Hegde 提交于
      As per PAPR we have to look for both EPOW sensor value and event
      modifier to identify the type of event and take appropriate action.
      
      In LoPAPR v1.1 section 10.2.2 includes table 136 "EPOW Action Codes":
      
        SYSTEM_SHUTDOWN 3
      
        The system must be shut down. An EPOW-aware OS logs the EPOW error
        log information, then schedules the system to be shut down to begin
        after an OS defined delay internal (default is 10 minutes.)
      
      Then in section 10.3.2.2.8 there is table 146 "Platform Event Log
      Format, Version 6, EPOW Section", which includes the "EPOW Event
      Modifier":
      
        For EPOW sensor value = 3
        0x01 = Normal system shutdown with no additional delay
        0x02 = Loss of utility power, system is running on UPS/Battery
        0x03 = Loss of system critical functions, system should be shutdown
        0x04 = Ambient temperature too high
        All other values = reserved
      
      We have a user space tool (rtas_errd) on LPAR to monitor for
      EPOW_SHUTDOWN_ON_UPS. Once it gets an event it initiates shutdown
      after predefined time. It also starts monitoring for any new EPOW
      events. If it receives "Power restored" event before predefined time
      it will cancel the shutdown. Otherwise after predefined time it will
      shutdown the system.
      
      Commit 79872e35 ("powerpc/pseries: All events of
      EPOW_SYSTEM_SHUTDOWN must initiate shutdown") changed our handling of
      the "on UPS/Battery" case, to immediately shutdown the system. This
      breaks existing setups that rely on the userspace tool to delay
      shutdown and let the system run on the UPS.
      
      Fixes: 79872e35 ("powerpc/pseries: All events of EPOW_SYSTEM_SHUTDOWN must initiate shutdown")
      Cc: stable@vger.kernel.org # v4.0+
      Signed-off-by: NVasant Hegde <hegdevasant@linux.vnet.ibm.com>
      [mpe: Massage change log and add PAPR references]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200820061844.306460-1-hegdevasant@linux.vnet.ibm.com
      90a9b102
    • F
      powerpc/powernv/pci: Fix possible crash when releasing DMA resources · e17a7c0e
      Frederic Barrat 提交于
      Fix a typo introduced during recent code cleanup, which could lead to
      silently not freeing resources or an oops message (on PCI hotplug or
      CAPI reset).
      
      Only impacts ioda2, the code path for ioda1 is correct.
      
      Fixes: 01e12629 ("powerpc/powernv/pci: Add explicit tracking of the DMA setup state")
      Signed-off-by: NFrederic Barrat <fbarrat@linux.ibm.com>
      Reviewed-by: NOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200819130741.16769-1-fbarrat@linux.ibm.com
      e17a7c0e
  2. 18 8月, 2020 1 次提交
    • M
      powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU death · 801980f6
      Michael Roth 提交于
      For a power9 KVM guest with XIVE enabled, running a test loop
      where we hotplug 384 vcpus and then unplug them, the following traces
      can be seen (generally within a few loops) either from the unplugged
      vcpu:
      
        cpu 65 (hwid 65) Ready to die...
        Querying DEAD? cpu 66 (66) shows 2
        list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048
        ------------[ cut here ]------------
        kernel BUG at lib/list_debug.c:56!
        Oops: Exception in kernel mode, sig: 5 [#1]
        LE SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in: fuse nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 ...
        CPU: 66 PID: 0 Comm: swapper/66 Kdump: loaded Not tainted 4.18.0-221.el8.ppc64le #1
        NIP:  c0000000007ab50c LR: c0000000007ab508 CTR: 00000000000003ac
        REGS: c0000009e5a17840 TRAP: 0700   Not tainted  (4.18.0-221.el8.ppc64le)
        MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28000842  XER: 20040000
        ...
        NIP __list_del_entry_valid+0xac/0x100
        LR  __list_del_entry_valid+0xa8/0x100
        Call Trace:
          __list_del_entry_valid+0xa8/0x100 (unreliable)
          free_pcppages_bulk+0x1f8/0x940
          free_unref_page+0xd0/0x100
          xive_spapr_cleanup_queue+0x148/0x1b0
          xive_teardown_cpu+0x1bc/0x240
          pseries_mach_cpu_die+0x78/0x2f0
          cpu_die+0x48/0x70
          arch_cpu_idle_dead+0x20/0x40
          do_idle+0x2f4/0x4c0
          cpu_startup_entry+0x38/0x40
          start_secondary+0x7bc/0x8f0
          start_secondary_prolog+0x10/0x14
      
      or on the worker thread handling the unplug:
      
        pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a
        Querying DEAD? cpu 314 (314) shows 2
        BUG: Bad page state in process kworker/u768:3  pfn:95de1
        cpu 314 (hwid 314) Ready to die...
        page:c00a000002577840 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0
        flags: 0x5ffffc00000000()
        raw: 005ffffc00000000 5deadbeef0000100 5deadbeef0000200 0000000000000000
        raw: 0000000000000000 0000000000000000 00000000ffffff7f 0000000000000000
        page dumped because: nonzero mapcount
        Modules linked in: kvm xt_CHECKSUM ipt_MASQUERADE xt_conntrack ...
        CPU: 0 PID: 548 Comm: kworker/u768:3 Kdump: loaded Not tainted 4.18.0-224.el8.bz1856588.ppc64le #1
        Workqueue: pseries hotplug workque pseries_hp_work_fn
        Call Trace:
          dump_stack+0xb0/0xf4 (unreliable)
          bad_page+0x12c/0x1b0
          free_pcppages_bulk+0x5bc/0x940
          page_alloc_cpu_dead+0x118/0x120
          cpuhp_invoke_callback.constprop.5+0xb8/0x760
          _cpu_down+0x188/0x340
          cpu_down+0x5c/0xa0
          cpu_subsys_offline+0x24/0x40
          device_offline+0xf0/0x130
          dlpar_offline_cpu+0x1c4/0x2a0
          dlpar_cpu_remove+0xb8/0x190
          dlpar_cpu_remove_by_index+0x12c/0x150
          dlpar_cpu+0x94/0x800
          pseries_hp_work_fn+0x128/0x1e0
          process_one_work+0x304/0x5d0
          worker_thread+0xcc/0x7a0
          kthread+0x1ac/0x1c0
          ret_from_kernel_thread+0x5c/0x80
      
      The latter trace is due to the following sequence:
      
        page_alloc_cpu_dead
          drain_pages
            drain_pages_zone
              free_pcppages_bulk
      
      where drain_pages() in this case is called under the assumption that
      the unplugged cpu is no longer executing. To ensure that is the case,
      and early call is made to __cpu_die()->pseries_cpu_die(), which runs a
      loop that waits for the cpu to reach a halted state by polling its
      status via query-cpu-stopped-state RTAS calls. It only polls for 25
      iterations before giving up, however, and in the trace above this
      results in the following being printed only .1 seconds after the
      hotplug worker thread begins processing the unplug request:
      
        pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a
        Querying DEAD? cpu 314 (314) shows 2
      
      At that point the worker thread assumes the unplugged CPU is in some
      unknown/dead state and procedes with the cleanup, causing the race
      with the XIVE cleanup code executed by the unplugged CPU.
      
      Fix this by waiting indefinitely, but also making an effort to avoid
      spurious lockup messages by allowing for rescheduling after polling
      the CPU status and printing a warning if we wait for longer than 120s.
      
      Fixes: eac1e731 ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
      Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Tested-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NThiago Jung Bauermann <bauerman@linux.ibm.com>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      [mpe: Trim oopses in change log slightly for readability]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200811161544.10513-1-mdroth@linux.vnet.ibm.com
      801980f6
  3. 08 8月, 2020 1 次提交
    • M
      mm: remove unneeded includes of <asm/pgalloc.h> · ca15ca40
      Mike Rapoport 提交于
      Patch series "mm: cleanup usage of <asm/pgalloc.h>"
      
      Most architectures have very similar versions of pXd_alloc_one() and
      pXd_free_one() for intermediate levels of page table.  These patches add
      generic versions of these functions in <asm-generic/pgalloc.h> and enable
      use of the generic functions where appropriate.
      
      In addition, functions declared and defined in <asm/pgalloc.h> headers are
      used mostly by core mm and early mm initialization in arch and there is no
      actual reason to have the <asm/pgalloc.h> included all over the place.
      The first patch in this series removes unneeded includes of
      <asm/pgalloc.h>
      
      In the end it didn't work out as neatly as I hoped and moving
      pXd_alloc_track() definitions to <asm-generic/pgalloc.h> would require
      unnecessary changes to arches that have custom page table allocations, so
      I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
      to mm/.
      
      This patch (of 8):
      
      In most cases <asm/pgalloc.h> header is required only for allocations of
      page table memory.  Most of the .c files that include that header do not
      use symbols declared in <asm/pgalloc.h> and do not require that header.
      
      As for the other header files that used to include <asm/pgalloc.h>, it is
      possible to move that include into the .c file that actually uses symbols
      from <asm/pgalloc.h> and drop the include from the header file.
      
      The process was somewhat automated using
      
      	sed -i -E '/[<"]asm\/pgalloc\.h/d' \
                      $(grep -L -w -f /tmp/xx \
                              $(git grep -E -l '[<"]asm/pgalloc\.h'))
      
      where /tmp/xx contains all the symbols defined in
      arch/*/include/asm/pgalloc.h.
      
      [rppt@linux.ibm.com: fix powerpc warning]
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
      Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca15ca40
  4. 03 8月, 2020 1 次提交
  5. 31 7月, 2020 2 次提交
    • V
      powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric · af0870c4
      Vaibhav Jain 提交于
      We add support for reporting 'fuel-gauge' NVDIMM metric via
      PAPR_PDSM_HEALTH pdsm payload. 'fuel-gauge' metric indicates the usage
      life remaining of a papr-scm compatible NVDIMM. PHYP exposes this
      metric via the H_SCM_PERFORMANCE_STATS.
      
      The metric value is returned from the pdsm by extending the return
      payload 'struct nd_papr_pdsm_health' without breaking the ABI. A new
      field 'dimm_fuel_gauge' to hold the metric value is introduced at the
      end of the payload struct and its presence is indicated by by
      extension flag PDSM_DIMM_HEALTH_RUN_GAUGE_VALID.
      
      The patch introduces a new function papr_pdsm_fuel_gauge() that is
      called from papr_pdsm_health(). If fetching NVDIMM performance stats
      is supported then 'papr_pdsm_fuel_gauge()' allocated an output buffer
      large enough to hold the performance stat and passes it to
      drc_pmem_query_stats() that issues the HCALL to PHYP. The return value
      of the stat is then populated in the 'struct
      nd_papr_pdsm_health.dimm_fuel_gauge' field with extension flag
      'PDSM_DIMM_HEALTH_RUN_GAUGE_VALID' set in 'struct
      nd_papr_pdsm_health.extension_flags'
      Signed-off-by: NVaibhav Jain <vaibhav@linux.ibm.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200731064153.182203-3-vaibhav@linux.ibm.com
      af0870c4
    • V
      powerpc/papr_scm: Fetch nvdimm performance stats from PHYP · 2d02bf83
      Vaibhav Jain 提交于
      Update papr_scm.c to query dimm performance statistics from PHYP via
      H_SCM_PERFORMANCE_STATS hcall and export them to user-space as PAPR
      specific NVDIMM attribute 'perf_stats' in sysfs. The patch also
      provide a sysfs ABI documentation for the stats being reported and
      their meanings.
      
      During NVDIMM probe time in papr_scm_nvdimm_init() a special variant
      of H_SCM_PERFORMANCE_STATS hcall is issued to check if collection of
      performance statistics is supported or not. If successful then a PHYP
      returns a maximum possible buffer length needed to read all
      performance stats. This returned value is stored in a per-nvdimm
      attribute 'stat_buffer_len'.
      
      The layout of request buffer for reading NVDIMM performance stats from
      PHYP is defined in 'struct papr_scm_perf_stats' and 'struct
      papr_scm_perf_stat'. These structs are used in newly introduced
      drc_pmem_query_stats() that issues the H_SCM_PERFORMANCE_STATS hcall.
      
      The sysfs access function perf_stats_show() uses value
      'stat_buffer_len' to allocate a buffer large enough to hold all
      possible NVDIMM performance stats and passes it to
      drc_pmem_query_stats() to populate. Finally statistics reported in the
      buffer are formatted into the sysfs access function output buffer.
      Signed-off-by: NVaibhav Jain <vaibhav@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200731064153.182203-2-vaibhav@linux.ibm.com
      2d02bf83
  6. 30 7月, 2020 3 次提交
  7. 29 7月, 2020 7 次提交
  8. 26 7月, 2020 23 次提交