1. 26 6月, 2014 1 次提交
  2. 12 6月, 2014 1 次提交
  3. 11 6月, 2014 15 次提交
    • M
      powerpc/book3s: Increment the mce counter during machine_check_early call. · e6654d5b
      Mahesh Salgaonkar 提交于
      We don't see MCE counter getting increased in /proc/interrupts which gives
      false impression of no MCE occurred even when there were MCE events.
      The machine check early handling was added for PowerKVM and we missed to
      increment the MCE count in the early handler.
      
      We also increment mce counters in the machine_check_exception call, but
      in most cases where we handle the error hypervisor never reaches there
      unless its fatal and we want to crash. Only during fatal situation we may
      see double increment of mce count. We need to fix that. But for
      now it always good to have some count increased instead of zero.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e6654d5b
    • M
      powerpc/book3s: Add stack overflow check in machine check handler. · e75ad93a
      Mahesh Salgaonkar 提交于
      Currently machine check handler does not check for stack overflow for
      nested machine check. If we hit another MCE while inside the machine check
      handler repeatedly from same address then we get into risk of stack
      overflow which can cause huge memory corruption. This patch limits the
      nested MCE level to 4 and panic when we cross level 4.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e75ad93a
    • M
      powerpc/book3s: Fix machine check handling for unhandled errors · 2749a2f2
      Mahesh Salgaonkar 提交于
      Current code does not check for unhandled/unrecovered errors and return from
      interrupt if it is recoverable exception which in-turn triggers same machine
      check exception in a loop causing hypervisor to be unresponsive.
      
      This patch fixes this situation and forces hypervisor to panic for
      unhandled/unrecovered errors.
      
      This patch also fixes another issue where unrecoverable_exception routine
      was called in real mode in case of unrecoverable exception (MSR_RI = 0).
      This causes another exception vector 0x300 (data access) during system crash
      leading to confusion while debugging cause of the system crash.
      
      Also turn ME bit off while going down, so that when another MCE is hit during
      panic path, system will checkstop and hypervisor will get restarted cleanly
      by SP.
      
      With the above fixes we now throw correct console messages (see below) while
      crashing the system in case of unhandled/unrecoverable machine checks.
      
      --------------
      Severe Machine check interrupt [[Not recovered]
        Initiator: CPU
        Error type: UE [Instruction fetch]
          Effective address: 0000000030002864
      Oops: Machine check, sig: 7 [#1]
      SMP NR_CPUS=2048 NUMA PowerNV
      Modules linked in: bork(O) bridge stp llc kvm [last unloaded: bork]
      CPU: 36 PID: 55162 Comm: bash Tainted: G           O 3.14.0mce #1
      task: c000002d72d022d0 ti: c000000007ec0000 task.ti: c000002d72de4000
      NIP: 0000000030002864 LR: 00000000300151a4 CTR: 000000003001518c
      REGS: c000000007ec3d80 TRAP: 0200   Tainted: G           O  (3.14.0mce)
      MSR: 9000000000041002 <SF,HV,ME,RI>  CR: 28222848  XER: 20000000
      CFAR: 0000000030002838 DAR: d0000000004d0000 DSISR: 00000000 SOFTE: 1
      GPR00: 000000003001512c 0000000031f92cb0 0000000030078af0 0000000030002864
      GPR04: d0000000004d0000 0000000000000000 0000000030002864 ffffffffffffffc9
      GPR08: 0000000000000024 0000000030008af0 000000000000002c c00000000150e728
      GPR12: 9000000000041002 0000000031f90000 0000000010142550 0000000040000000
      GPR16: 0000000010143cdc 0000000000000000 00000000101306fc 00000000101424dc
      GPR20: 00000000101424e0 000000001013c6f0 0000000000000000 0000000000000000
      GPR24: 0000000010143ce0 00000000100f6440 c000002d72de7e00 c000002d72860250
      GPR28: c000002d72860240 c000002d72ac0038 0000000000000008 0000000000040000
      NIP [0000000030002864] 0x30002864
      LR [00000000300151a4] 0x300151a4
      Call Trace:
      Instruction dump:
      XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
      XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
      ---[ end trace 7285f0beac1e29d3 ]---
      
      Sending IPI to other CPUs
      IPI complete
      OPAL V3 detected !
      --------------
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2749a2f2
    • G
      powerpc/eeh: Dump PE location code · 357b2f3d
      Gavin Shan 提交于
      As Ben suggested, it's meaningful to dump PE's location code
      for site engineers when hitting EEH errors. The patch introduces
      function eeh_pe_loc_get() to retireve the location code from
      dev-tree so that we can output it when hitting EEH errors.
      
      If primary PE bus is root bus, the PHB's dev-node would be tried
      prior to root port's dev-node. Otherwise, the upstream bridge's
      dev-node of the primary PE bus will be check for the location code
      directly.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      357b2f3d
    • M
      powerpc/powernv: Enable POWER8 doorbell IPIs · d4e58e59
      Michael Neuling 提交于
      This patch enables POWER8 doorbell IPIs on powernv.
      
      Since doorbells can only IPI within a core, we test to see when we can use
      doorbells and if not we fall back to XICS.  This also enables hypervisor
      doorbells to wakeup us up from nap/sleep via the LPCR PECEDH bit.
      
      Based on tests by Anton, the best case IPI latency between two threads dropped
      from 894ns to 512ns.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d4e58e59
    • G
      powerpc/powernv: Fix killed EEH event · 5c7a35e3
      Gavin Shan 提交于
      On PowerNV platform, EEH errors are reported by IO accessors or poller
      driven by interrupt. After the PE is isolated, we won't produce EEH
      event for the PE. The current implementation has possibility of EEH
      event lost in this way:
      
      The interrupt handler queues one "special" event, which drives the poller.
      EEH thread doesn't pick the special event yet. IO accessors kicks in, the
      frozen PE is marked as "isolated" and EEH event is queued to the list.
      EEH thread runs because of special event and purge all existing EEH events.
      However, we never produce an other EEH event for the frozen PE. Eventually,
      the PE is marked as "isolated" and we don't have EEH event to recover it.
      
      The patch fixes the issue to keep EEH events for PEs that have been
      marked as "isolated" with the help of additional "force" help to
      eeh_remove_event().
      Reported-by: NRolf Brudeseth <rolfb@us.ibm.com>
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5c7a35e3
    • P
      powerpc: fix typo 'CONFIG_PMAC' · 6e0fdf9a
      Paul Bolle 提交于
      Commit b0d278b7 ("powerpc/perf_event: Reduce latency of calling
      perf_event_do_pending") added a check for CONFIG_PMAC were a check for
      CONFIG_PPC_PMAC was clearly intended.
      
      Fixes: b0d278b7 ("powerpc/perf_event: Reduce latency of calling perf_event_do_pending")
      Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6e0fdf9a
    • G
      powerpc/eeh: Report frozen parent PE prior to child PE · 1ad7a72c
      Gavin Shan 提交于
      When we have the corner case of frozen parent and child PE at the
      same time, we have to handle the frozen parent PE prior to the
      child. Without clearning the frozen state on parent PE, the child
      PE can't be recovered successfully.
      
      The patch searches the EEH PE hierarchy tree and returns the toppest
      frozen PE to be handled. It ensures the frozen parent PE will be
      handled prior to child PE.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1ad7a72c
    • G
      powerpc/eeh: Clear frozen state for child PE · 2c665992
      Gavin Shan 提交于
      Since commit cb523e09 ("powerpc/eeh: Avoid I/O access during PE
      reset"), the PE is kept as frozen state on hardware level until
      the PE reset is done completely. After that, we explicitly clear
      the frozen state of the affected PE. However, there might have
      frozen child PEs of the affected PE and we also need clear their
      frozen state as well. Otherwise, the recovery is going to fail.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2c665992
    • M
      powerpc: Don't setup CPUs with bad status · 59a53afe
      Michael Neuling 提交于
      OPAL will mark a CPU that is guarded as "bad" in the status property of the CPU
      node.
      
      Unfortunatley Linux doesn't check this property and will put the bad CPU in the
      present map.  This has caused hangs on booting when we try to unsplit the core.
      
      This patch checks the CPU is avaliable via this status property before putting
      it in the present map.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Tested-by: NAnton Blanchard <anton@samba.org>
      cc: stable@vger.kernel.org
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      59a53afe
    • S
      powerpc: Correct DSCR during TM context switch · 96d01610
      Sam bobroff 提交于
      Correct the DSCR SPR becoming temporarily corrupted if a task is
      context switched during a transaction.
      
      The problem occurs while suspending the task and is caused by saving
      the DSCR to thread.dscr after it has already been set to the CPU's
      default value:
      
      __switch_to() calls __switch_to_tm()
      	which calls tm_reclaim_task()
      	which calls tm_reclaim_thread()
      	which calls tm_reclaim()
      		where the DSCR is set to the CPU's default
      __switch_to() calls _switch()
      		where thread.dscr is set to the DSCR
      
      When the task is resumed, it's transaction will be doomed (as usual)
      and the DSCR SPR will be corrupted, although the checkpointed value
      will be correct. Therefore the DSCR will be immediately corrected by
      the transaction aborting, unless it has been suspended. In that case
      the incorrect value can be seen by the task until it resumes the
      transaction.
      
      The fix is to treat the DSCR similarly to the TAR and save it early
      in __switch_to().
      
      A program exposing the problem is added to the kernel self tests as:
      tools/testing/selftests/powerpc/tm/tm-resched-dscr.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      CC: <stable@vger.kernel.org> [v3.10+]
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      96d01610
    • M
      powerpc: Remove platforms/wsp and associated pieces · fb5a5157
      Michael Ellerman 提交于
      __attribute__ ((unused))
      
      WSP is the last user of CONFIG_PPC_A2, so we remove that as well.
      
      Although CONFIG_PPC_ICSWX still exists, it's no longer selectable for
      any Book3E platform, so we can remove the code in mmu-book3e.h that
      depended on it.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      fb5a5157
    • P
      powerpc: Remove check for CONFIG_SERIAL_TEXT_DEBUG · 94314290
      Paul Bolle 提交于
      The Kconfig symbol SERIAL_TEXT_DEBUG was removed from
      arch/powerpc/Kconfig.debug in v2.6.22. (In v2.6.27 it was also removed
      from arch/ppc/Kconfig.debug.) So the check for its macro has evaluated
      to false for over five years now. Remove that check and the few lines
      of code hidden behind it.
      Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      94314290
    • B
      powerpc: Add AT_HWCAP2 to indicate V.CRYPTO category support · dd58a092
      Benjamin Herrenschmidt 提交于
      The Vector Crypto category instructions are supported by current POWER8
      chips, advertise them to userspace using a specific bit to properly
      differentiate with chips of the same architecture level that might not
      have them.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: <stable@vger.kernel.org> [v3.10+]
      dd58a092
    • T
      booke/watchdog: refine and clean up the codes · d2deebab
      Tang Yuantian 提交于
      Basically, this patch does the following:
      1. Move the codes of parsing boot parameters from setup-common.c
         to driver. In this way, code reader can know directly that
         there are boot parameters that can change the timeout.
      2. Make boot parameter 'booke_wdt_period' effective.
         currently, when driver is loaded, default timeout is always
         being used in stead of booke_wdt_period.
      3. Wrap up the watchdog timeout in device struct and clean up
         unnecessary codes.
      Signed-off-by: NTang Yuantian <yuantian.tang@freescale.com>
      Acked-by: NScott Wood <scottwood@freescale.com>
      Reviewed-by: NLi Yang <leoli@freescale.com>
      Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
      d2deebab
  4. 07 6月, 2014 1 次提交
  5. 05 6月, 2014 4 次提交
    • N
      sched: Rename capacity related flags · 5d4dfddd
      Nicolas Pitre 提交于
      It is better not to think about compute capacity as being equivalent
      to "CPU power".  The upcoming "power aware" scheduler work may create
      confusion with the notion of energy consumption if "power" is used too
      liberally.
      
      Let's rename the following feature flags since they do relate to capacity:
      
      	SD_SHARE_CPUPOWER  -> SD_SHARE_CPUCAPACITY
      	ARCH_POWER         -> ARCH_CAPACITY
      	NONTASK_POWER      -> NONTASK_CAPACITY
      Signed-off-by: NNicolas Pitre <nico@linaro.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: linaro-kernel@lists.linaro.org
      Cc: Andy Fleming <afleming@freescale.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: devicetree@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/n/tip-e93lpnxb87owfievqatey6b5@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5d4dfddd
    • A
      powerpc: Allow ppc_md platform hook to override memory_block_size_bytes · a5d86257
      Anton Blanchard 提交于
      The pseries platform code unconditionally overrides
      memory_block_size_bytes regardless of the running platform.
      
      Create a ppc_md hook that so each platform can choose to
      do what it wants.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a5d86257
    • W
      powerpc/eeh: Skip eeh sysfs when eeh is disabled · 2213fb14
      Wei Yang 提交于
      When eeh is not enabled, and hotplug two pci devices on the same bus, eeh
      related sysfs would be added twice for the first added pci device. Since the
      eeh_dev is not created when eeh is not enabled.
      
      This patch adds the check, if eeh is not enabled, eeh sysfs will not be
      created.
      
      After applying this patch, following warnings are reduced:
      
      sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:00.0/eeh_mode'
      sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:00.0/eeh_config_addr'
      sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:00.0/eeh_pe_config_addr'
      Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
      Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2213fb14
    • B
      powerpc/serial: Use saner flags when creating legacy ports · c4cad90f
      Benjamin Herrenschmidt 提交于
      We had a mix & match of flags used when creating legacy ports
      depending on where we found them in the device-tree. Among others
      we were missing UPF_SKIP_TEST for some kind of ISA ports which is
      a problem as quite a few UARTs out there don't support the loopback
      test (such as a lot of BMCs).
      
      Let's pick the set of flags used by the SoC code and generalize it
      which means autoconf, no loopback test, irq maybe shared and fixed
      port.
      
      Sending to stable as the lack of UPF_SKIP_TEST is breaking
      serial on some machines so I want this back into distros
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: stable@vger.kernel.org
      c4cad90f
  6. 30 5月, 2014 8 次提交
    • A
      KVM: PPC: Book3S PR: Rework SLB switching code · d8d164a9
      Alexander Graf 提交于
      On LPAR guest systems Linux enables the shadow SLB to indicate to the
      hypervisor a number of SLB entries that always have to be available.
      
      Today we go through this shadow SLB and disable all ESID's valid bits.
      However, pHyp doesn't like this approach very much and honors us with
      fancy machine checks.
      
      Fortunately the shadow SLB descriptor also has an entry that indicates
      the number of valid entries following. During the lifetime of a guest
      we can just swap that value to 0 and don't have to worry about the
      SLB restoration magic.
      
      While we're touching the code, let's also make it more readable (get
      rid of rldicl), allow it to deal with a dynamic number of bolted
      SLB entries and only do shadow SLB swizzling on LPAR systems.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      d8d164a9
    • A
      PPC: ePAPR: Fix hypercall on LE guest · 235959be
      Alexander Graf 提交于
      We get an array of instructions from the hypervisor via device tree that
      we write into a buffer that gets executed whenever we want to make an
      ePAPR compliant hypercall.
      
      However, the hypervisor passes us these instructions in BE order which
      we have to manually convert to LE when we want to run them in LE mode.
      
      With this fixup in place, I can successfully run LE kernels with KVM
      PV enabled on PR KVM.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      235959be
    • A
      KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handler · ddca156a
      Aneesh Kumar K.V 提交于
      Use make_dsisr instead of open coding it. This also have
      the added benefit of handling alignment interrupt on additional
      instructions.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ddca156a
    • A
      PPC: KVM: Make NX bit available with magic page · 5c165aec
      Alexander Graf 提交于
      Because old kernels enable the magic page and then choke on NXed trampoline
      code we have to disable NX by default in KVM when we use the magic page.
      
      However, since commit b18db0b8 we have successfully fixed that and can now
      leave NX enabled, so tell the hypervisor about this.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5c165aec
    • A
      KVM: PPC: Book3S PR: Expose TAR facility to guest · e14e7a1e
      Alexander Graf 提交于
      POWER8 implements a new register called TAR. This register has to be
      enabled in FSCR and then from KVM's point of view is mere storage.
      
      This patch enables the guest to use TAR.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e14e7a1e
    • A
      KVM: PPC: Book3S PR: Handle Facility interrupt and FSCR · 616dff86
      Alexander Graf 提交于
      POWER8 introduced a new interrupt type called "Facility unavailable interrupt"
      which contains its status message in a new register called FSCR.
      
      Handle these exits and try to emulate instructions for unhandled facilities.
      Follow-on patches enable KVM to expose specific facilities into the guest.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      616dff86
    • A
      KVM: PPC: Make shared struct aka magic page guest endian · 5deb8e7a
      Alexander Graf 提交于
      The shared (magic) page is a data structure that contains often used
      supervisor privileged SPRs accessible via memory to the user to reduce
      the number of exits we have to take to read/write them.
      
      When we actually share this structure with the guest we have to maintain
      it in guest endianness, because some of the patch tricks only work with
      native endian load/store operations.
      
      Since we only share the structure with either host or guest in little
      endian on book3s_64 pr mode, we don't have to worry about booke or book3s hv.
      
      For booke, the shared struct stays big endian. For book3s_64 hv we maintain
      the struct in host native endian, since it never gets shared with the guest.
      
      For book3s_64 pr we introduce a variable that tells us which endianness the
      shared struct is in and route every access to it through helper inline
      functions that evaluate this variable.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5deb8e7a
    • A
      KVM: PPC: BOOK3S: PR: Enable Little Endian PR guest · e5ee5422
      Aneesh Kumar K.V 提交于
      This patch make sure we inherit the LE bit correctly in different case
      so that we can run Little Endian distro in PR mode
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e5ee5422
  7. 28 5月, 2014 10 次提交
    • S
      powerpc: Fix regression of per-CPU DSCR setting · 1739ea9e
      Sam bobroff 提交于
      Since commit "efcac658 powerpc: Per process DSCR + some fixes (try#4)"
      it is no longer possible to set the DSCR on a per-CPU basis.
      
      The old behaviour was to minipulate the DSCR SPR directly but this is no
      longer sufficient: the value is quickly overwritten by context switching.
      
      This patch stores the per-CPU DSCR value in a kernel variable rather than
      directly in the SPR and it is used whenever a process has not set the DSCR
      itself. The sysfs interface (/sys/devices/system/cpu/cpuN/dscr) is unchanged.
      
      Writes to the old global default (/sys/devices/system/cpu/dscr_default)
      now set all of the per-CPU values and reads return the last written value.
      
      The new per-CPU default is added to the paca_struct and is used everywhere
      outside of sysfs.c instead of the old global default.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1739ea9e
    • S
      powerpc: Split __SYSFS_SPRSETUP macro · 39a360ef
      Sam bobroff 提交于
      Split the __SYSFS_SPRSETUP macro into two parts so that registers requiring
      custom read and write functions can use common code for their show and store
      functions.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      39a360ef
    • R
      arch: powerpc/fadump: Cleaning up inconsistent NULL checks · b717d985
      Rickard Strandqvist 提交于
      Cleaning up inconsistent NULL checks.
      There is otherwise a risk of a possible null pointer dereference.
      
      Was largely found by using a static code analysis program called cppcheck.
      Signed-off-by: NRickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b717d985
    • M
      powerpc: Check cpu_thread_in_subcore() in __cpu_up() · 6f5e40a3
      Michael Ellerman 提交于
      To support split core we need to change the check in __cpu_up() that
      determines if a cpu is allowed to come online.
      
      Currently we refuse to online cpus which are not the primary thread
      within their core.
      
      On POWER8 with split core support this check needs to instead refuse to
      online cpus which are not the primary thread within their *sub* core.
      
      On POWER7 and other systems that do not support split core,
      threads_per_subcore == threads_per_core and so the check is equivalent.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6f5e40a3
    • M
      powerpc: Add threads_per_subcore · 5853aef1
      Michael Ellerman 提交于
      On POWER8 we have a new concept of a subcore. This is what happens when
      you take a regular core and split it. A subcore is a grouping of two or
      four SMT threads, as well as a handfull of SPRs which allows the subcore
      to appear as if it were a core from the point of view of a guest.
      
      Unlike threads_per_core which is fixed at boot, threads_per_subcore can
      change while the system is running. Most code will not want to use
      threads_per_subcore.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5853aef1
    • M
      powerpc/powernv: Make it possible to skip the IRQHAPPENED check in power7_nap() · 8d6f7c5a
      Michael Ellerman 提交于
      To support split core we need to be able to force all secondaries into
      nap, so the core can detect they are idle and do an unsplit.
      
      Currently power7_nap() will return without napping if there is an irq
      pending. We want to ignore the pending irq and nap anyway, we will deal
      with the interrupt later.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8d6f7c5a
    • M
      powerpc/kvm/book3s_hv: Rework the secondary inhibit code · 441c19c8
      Michael Ellerman 提交于
      As part of the support for split core on POWER8, we want to be able to
      block splitting of the core while KVM VMs are active.
      
      The logic to do that would be exactly the same as the code we currently
      have for inhibiting onlining of secondaries.
      
      Instead of adding an identical mechanism to block split core, rework the
      secondary inhibit code to be a "HV KVM is active" check. We can then use
      that in both the cpu hotplug code and the upcoming split core code.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Acked-by: NAlexander Graf <agraf@suse.de>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      441c19c8
    • N
      powerpc/numa: Enable CONFIG_HAVE_MEMORYLESS_NODES · 64bb80d8
      Nishanth Aravamudan 提交于
      Based off fd1197f1 for ia64, enable CONFIG_HAVE_MEMORYLESS_NODES if
      NUMA. Initialize the local memory node in start_secondary.
      
      With this commit and the preceding to enable
      CONFIG_USER_PERCPU_NUMA_NODE_ID, which is a prerequisite, in a PowerKVM
      guest with the following topology:
      
      numactl --hardware
      available: 3 nodes (0-2)
      node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
      23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
      47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
      71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
      95 96 97 98 99
      node 0 size: 1998 MB
      node 0 free: 521 MB
      node 1 cpus: 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114
      115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
      133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150
      151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168
      169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186
      187 188 189 190 191 192 193 194 195 196 197 198 199
      node 1 size: 0 MB
      node 1 free: 0 MB
      node 2 cpus:
      node 2 size: 2039 MB
      node 2 free: 1739 MB
      node distances:
      node   0   1   2
        0:  10  40  40
        1:  40  10  40
        2:  40  40  10
      
      the unreclaimable slab is reduced by close to 130M:
      
      Before:
              Slab:             418176 kB
              SReclaimable:      26624 kB
              SUnreclaim:       391552 kB
      
      After:
              Slab:             298944 kB
              SReclaimable:      31744 kB
              SUnreclaim:       267200 kB
      Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      64bb80d8
    • N
      powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID · 8c272261
      Nishanth Aravamudan 提交于
      Based off 3bccd996 for ia64, convert powerpc to use the generic per-CPU
      topology tracking, specifically:
      
          initialize per cpu numa_node entry in start_secondary
          remove the powerpc cpu_to_node()
          define CONFIG_USE_PERCPU_NUMA_NODE_ID if NUMA
      Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8c272261
    • S
      powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode · 011e4b02
      Srivatsa S. Bhat 提交于
      If we try to perform a kexec when the machine is in ST (Single-Threaded) mode
      (ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we
      get the following messages during boot:
      
      [    0.089866] POWER8 performance monitor hardware support registered
      [    0.089985] power8-pmu: PMAO restore workaround active.
      [    5.095419] Processor 1 is stuck.
      [   10.097933] Processor 2 is stuck.
      [   15.100480] Processor 3 is stuck.
      [   20.102982] Processor 4 is stuck.
      [   25.105489] Processor 5 is stuck.
      [   30.108005] Processor 6 is stuck.
      [   35.110518] Processor 7 is stuck.
      [   40.113369] Processor 9 is stuck.
      [   45.115879] Processor 10 is stuck.
      [   50.118389] Processor 11 is stuck.
      [   55.120904] Processor 12 is stuck.
      [   60.123425] Processor 13 is stuck.
      [   65.125970] Processor 14 is stuck.
      [   70.128495] Processor 15 is stuck.
      [   75.131316] Processor 17 is stuck.
      
      Note that only the sibling threads are stuck, while the primary threads (0, 8,
      16 etc) boot just fine. Looking closer at the previous step of kexec, we observe
      that kexec tries to wakeup (bring online) the sibling threads of all the cores,
      before performing kexec:
      
      [ 9464.131231] Starting new kernel
      [ 9464.148507] kexec: Waking offline cpu 1.
      [ 9464.148552] kexec: Waking offline cpu 2.
      [ 9464.148600] kexec: Waking offline cpu 3.
      [ 9464.148636] kexec: Waking offline cpu 4.
      [ 9464.148671] kexec: Waking offline cpu 5.
      [ 9464.148708] kexec: Waking offline cpu 6.
      [ 9464.148743] kexec: Waking offline cpu 7.
      [ 9464.148779] kexec: Waking offline cpu 9.
      [ 9464.148815] kexec: Waking offline cpu 10.
      [ 9464.148851] kexec: Waking offline cpu 11.
      [ 9464.148887] kexec: Waking offline cpu 12.
      [ 9464.148922] kexec: Waking offline cpu 13.
      [ 9464.148958] kexec: Waking offline cpu 14.
      [ 9464.148994] kexec: Waking offline cpu 15.
      [ 9464.149030] kexec: Waking offline cpu 17.
      
      Instrumenting this piece of code revealed that the cpu_up() operation actually
      fails with -EBUSY. Thus, only the primary threads of all the cores are online
      during kexec, and hence this is a sure-shot receipe for disaster, as explained
      in commit e8e5c215 (powerpc/kexec: Fix orphaned offline CPUs across kexec),
      as well as in the comment above wake_offline_cpus().
      
      It turns out that cpu_up() was returning -EBUSY because the variable
      'cpu_hotplug_disabled' was set to 1; and this disabling of CPU hotplug was done
      by migrate_to_reboot_cpu() inside kernel_kexec().
      
      Now, migrate_to_reboot_cpu() was originally written with the assumption that
      any further code will not need to perform CPU hotplug, since we are anyway in
      the reboot path. However, kexec is clearly not such a case, since we depend on
      onlining CPUs, atleast on powerpc.
      
      So re-enable cpu-hotplug after returning from migrate_to_reboot_cpu() in the
      kexec path, to fix this regression in kexec on powerpc.
      
      Also, wrap the cpu_up() in powerpc kexec code within a WARN_ON(), so that we
      can catch such issues more easily in the future.
      
      Fixes: c97102ba (kexec: migrate to reboot cpu)
      Cc: stable@vger.kernel.org
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      011e4b02