1. 17 5月, 2018 1 次提交
    • N
      powerpc: Allow LD_DEAD_CODE_DATA_ELIMINATION to be selected · 4c1d9bb0
      Nicholas Piggin 提交于
      This requires further changes to linker script to KEEP some tables
      and wildcard compiler generated sections into the right place. This
      includes pp32 modifications from Christophe Leroy.
      
      When compiling powernv_defconfig with this option, the resulting
      kernel is almost 400kB smaller (and still boots):
      
          text      data       bss        dec   filename
      11827621   4810490   1341080   17979191   vmlinux
      11752437   4598858   1338776   17690071   vmlinux.dcde
      
      Mathieu's numbers for custom Mac Mini G4 config has almost 200kB
      saving. It also had some increase in vmlinux size for as-yet
      unknown reasons.
      
          text      data       bss        dec   filename
       7461457   2475122   1428064   11364643   vmlinux
       7386425   2364370   1425432   11176227   vmlinux.dcde
      
      Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> [8xx]
      Tested-by: Mathieu Malaterre <malat@debian.org> [32-bit powermac]
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      4c1d9bb0
  2. 27 4月, 2018 2 次提交
  3. 25 4月, 2018 2 次提交
    • N
      powerpc: Fix smp_send_stop NMI IPI handling · ac61c115
      Nicholas Piggin 提交于
      The NMI IPI handler for a receiving CPU increments nmi_ipi_busy_count
      over the handler function call, which causes later smp_send_nmi_ipi()
      callers to spin until the call is finished.
      
      The stop_this_cpu() function never returns, so the busy count is never
      decremeted, which can cause the system to hang in some cases. For
      example panic() will call smp_send_stop() early on which calls
      stop_this_cpu() on other CPUs, then later in the reboot path,
      pnv_restart() will call smp_send_stop() again, which hangs.
      
      Fix this by adding a special case to the stop_this_cpu() handler to
      decrement the busy count, because it will never return.
      
      Now that the NMI/non-NMI versions of stop_this_cpu() are different,
      split them out into separate functions rather than doing #ifdef tricks
      to share the body between the two functions.
      
      Fixes: 6bed3237 ("powerpc: use NMI IPI for smp_send_stop")
      Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Split out the functions, tweak change log a bit]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ac61c115
    • N
      rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops · 682e6b4d
      Nicholas Piggin 提交于
      The OPAL RTC driver does not sleep in case it gets OPAL_BUSY or
      OPAL_BUSY_EVENT from firmware, which causes large scheduling
      latencies, up to 50 seconds have been observed here when RTC stops
      responding (BMC reboot can do it).
      
      Fix this by converting it to the standard form OPAL_BUSY loop that
      sleeps.
      
      Fixes: 628daa8d ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks")
      Cc: stable@vger.kernel.org # v3.2+
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Acked-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      682e6b4d
  4. 24 4月, 2018 6 次提交
    • M
      powerpc/mce: Fix a bug where mce loops on memory UE. · 75ecfb49
      Mahesh Salgaonkar 提交于
      The current code extracts the physical address for UE errors and then
      hooks it up into memory failure infrastructure. On successful
      extraction of physical address it wrongly sets "handled = 1" which
      means this UE error has been recovered. Since MCE handler gets return
      value as handled = 1, it assumes that error has been recovered and
      goes back to same NIP. This causes MCE interrupt again and again in a
      loop leading to hard lockup.
      
      Also, initialize phys_addr to ULONG_MAX so that we don't end up
      queuing undesired page to hwpoison.
      
      Without this patch we see:
        Severe Machine check interrupt [Recovered]
          NIP: [000000001002588c] PID: 7109 Comm: find
          Initiator: CPU
          Error type: UE [Load/Store]
            Effective address: 00007fffd2755940
            Physical address:  000020181a080000
        ...
        Severe Machine check interrupt [Recovered]
          NIP: [000000001002588c] PID: 7109 Comm: find
          Initiator: CPU
          Error type: UE [Load/Store]
            Effective address: 00007fffd2755940
            Physical address:  000020181a080000
        Severe Machine check interrupt [Recovered]
          NIP: [000000001002588c] PID: 7109 Comm: find
          Initiator: CPU
          Error type: UE [Load/Store]
            Effective address: 00007fffd2755940
            Physical address:  000020181a080000
        Memory failure: 0x20181a08: recovery action for dirty LRU page: Recovered
        Memory failure: 0x20181a08: already hardware poisoned
        Memory failure: 0x20181a08: already hardware poisoned
        Memory failure: 0x20181a08: already hardware poisoned
        Memory failure: 0x20181a08: already hardware poisoned
        Memory failure: 0x20181a08: already hardware poisoned
        Memory failure: 0x20181a08: already hardware poisoned
        ...
        Watchdog CPU:38 Hard LOCKUP
      
      After this patch we see:
      
        Severe Machine check interrupt [Not recovered]
          NIP: [00007fffaae585f4] PID: 7168 Comm: find
          Initiator: CPU
          Error type: UE [Load/Store]
            Effective address: 00007fffaafe28ac
            Physical address:  00002017c0bd0000
        find[7168]: unhandled signal 7 at 00007fffaae585f4 nip 00007fffaae585f4 lr 00007fffaae585e0 code 4
        Memory failure: 0x2017c0bd: recovery action for dirty LRU page: Recovered
      
      Fixes: 01eaac2b ("powerpc/mce: Hookup ierror (instruction) UE errors")
      Fixes: ba41e1e1 ("powerpc/mce: Hookup derror (load/store) UE errors")
      Cc: stable@vger.kernel.org # v4.15+
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      75ecfb49
    • A
      powerpc/powernv/npu: Do a PID GPU TLB flush when invalidating a large address range · d0cf9b56
      Alistair Popple 提交于
      The NPU has a limited number of address translation shootdown (ATSD)
      registers and the GPU has limited bandwidth to process ATSDs. This can
      result in contention of ATSD registers leading to soft lockups on some
      threads, particularly when invalidating a large address range in
      pnv_npu2_mn_invalidate_range().
      
      At some threshold it becomes more efficient to flush the entire GPU
      TLB for the given MM context (PID) than individually flushing each
      address in the range. This patch will result in ranges greater than
      2MB being converted from 32+ ATSDs into a single ATSD which will flush
      the TLB for the given PID on each GPU.
      
      Fixes: 1ab66d1f ("powerpc/powernv: Introduce address translation services for Nvlink2")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Tested-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d0cf9b56
    • A
      powerpc/powernv/npu: Prevent overwriting of pnv_npu2_init_contex() callback parameters · a1409ada
      Alistair Popple 提交于
      There is a single npu context per set of callback parameters. Callers
      should be prevented from overwriting existing callback values so
      instead return an error if different parameters are passed.
      
      Fixes: 1ab66d1f ("powerpc/powernv: Introduce address translation services for Nvlink2")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      Reviewed-by: NMark Hairgrove <mhairgrove@nvidia.com>
      Tested-by: NMark Hairgrove <mhairgrove@nvidia.com>
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a1409ada
    • A
      powerpc/powernv/npu: Add lock to prevent race in concurrent context init/destroy · 28a5933e
      Alistair Popple 提交于
      The pnv_npu2_init_context() and pnv_npu2_destroy_context() functions
      are used to allocate/free contexts to allow address translation and
      shootdown by the NPU on a particular GPU. Context initialisation is
      implicitly safe as it is protected by the requirement mmap_sem be held
      in write mode, however pnv_npu2_destroy_context() does not require
      mmap_sem to be held and it is not safe to call with a concurrent
      initialisation for a different GPU.
      
      It was assumed the driver would ensure destruction was not called
      concurrently with initialisation. However the driver may be simplified
      by allowing concurrent initialisation and destruction for different
      GPUs. As npu context creation/destruction is not a performance
      critical path and the critical section is not large a single spinlock
      is used for simplicity.
      
      Fixes: 1ab66d1f ("powerpc/powernv: Introduce address translation services for Nvlink2")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      Reviewed-by: NMark Hairgrove <mhairgrove@nvidia.com>
      Tested-by: NMark Hairgrove <mhairgrove@nvidia.com>
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      28a5933e
    • B
      powerpc/powernv/memtrace: Let the arch hotunplug code flush cache · 7fd6641d
      Balbir Singh 提交于
      Don't do this via custom code, instead now that we have support in the
      arch hotplug/hotunplug code, rely on those routines to do the right
      thing.
      
      The existing flush doesn't work because it uses ppc64_caches.l1d.size
      instead of ppc64_caches.l1d.line_size.
      
      Fixes: 9d5171a8 ("powerpc/powernv: Enable removal of memory for in memory tracing")
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      Reviewed-by: NRashmica Gupta <rashmica.g@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7fd6641d
    • B
      powerpc/mm: Flush cache on memory hot(un)plug · fb5924fd
      Balbir Singh 提交于
      This patch adds support for flushing potentially dirty cache lines
      when memory is hot-plugged/hot-un-plugged. The support is currently
      limited to 64 bit systems.
      
      The bug was exposed when mappings for a device were actually
      hot-unplugged and plugged in back later. A similar issue was observed
      during the development of memtrace, but memtrace does it's own
      flushing of region via a custom routine.
      
      These patches do a flush both on hotplug/unplug to clear any stale
      data in the cache w.r.t mappings, there is a small race window where a
      clean cache line may be created again just prior to tearing down the
      mapping.
      
      The patches were tested by disabling the flush routines in memtrace
      and doing I/O on the trace file. The system immediately
      checkstops (quite reliablly if prior to the hot-unplug of the memtrace
      region, we memset the regions we are about to hot unplug). After these
      patches no custom flushing is needed in the memtrace code.
      
      Fixes: 9d5171a8 ("powerpc/powernv: Enable removal of memory for in memory tracing")
      Cc: stable@vger.kernel.org # v4.14+
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      Acked-by: NReza Arbab <arbab@linux.ibm.com>
      Reviewed-by: NRashmica Gupta <rashmica.g@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fb5924fd
  5. 21 4月, 2018 1 次提交
  6. 19 4月, 2018 2 次提交
    • M
      powerpc/kvm: Fix lockups when running KVM guests on Power8 · 56376c58
      Michael Ellerman 提交于
      When running KVM guests on Power8 we can see a lockup where one CPU
      stops responding. This often leads to a message such as:
      
        watchdog: CPU 136 detected hard LOCKUP on other CPUs 72
        Task dump for CPU 72:
        qemu-system-ppc R  running task    10560 20917  20908 0x00040004
      
      And then backtraces on other CPUs, such as:
      
        Task dump for CPU 48:
        ksmd            R  running task    10032  1519      2 0x00000804
        Call Trace:
          ...
          --- interrupt: 901 at smp_call_function_many+0x3c8/0x460
              LR = smp_call_function_many+0x37c/0x460
          pmdp_invalidate+0x100/0x1b0
          __split_huge_pmd+0x52c/0xdb0
          try_to_unmap_one+0x764/0x8b0
          rmap_walk_anon+0x15c/0x370
          try_to_unmap+0xb4/0x170
          split_huge_page_to_list+0x148/0xa30
          try_to_merge_one_page+0xc8/0x990
          try_to_merge_with_ksm_page+0x74/0xf0
          ksm_scan_thread+0x10ec/0x1ac0
          kthread+0x160/0x1a0
          ret_from_kernel_thread+0x5c/0x78
      
      This is caused by commit 8c1c7fb0 ("powerpc/64s/idle: avoid sync
      for KVM state when waking from idle"), which added a check in
      pnv_powersave_wakeup() to see if the kvm_hstate.hwthread_state is
      already set to KVM_HWTHREAD_IN_KERNEL, and if so to skip the store and
      test of kvm_hstate.hwthread_req.
      
      The problem is that the primary does not set KVM_HWTHREAD_IN_KVM when
      entering the guest, so it can then come out to cede with
      KVM_HWTHREAD_IN_KERNEL set. It can then go idle in kvm_do_nap after
      setting hwthread_req to 1, but because hwthread_state is still
      KVM_HWTHREAD_IN_KERNEL we will skip the test of hwthread_req when we
      wake up from idle and won't go to kvm_start_guest. From there the
      thread will return somewhere garbage and crash.
      
      Fix it by skipping the store of hwthread_state, but not the test of
      hwthread_req, when coming out of idle. It's OK to skip the sync in
      that case because hwthread_req will have been set on the same thread,
      so there is no synchronisation required.
      
      Fixes: 8c1c7fb0 ("powerpc/64s/idle: avoid sync for KVM state when waking from idle")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      56376c58
    • M
      powerpc/eeh: Fix enabling bridge MMIO windows · 13a83eac
      Michael Neuling 提交于
      On boot we save the configuration space of PCIe bridges. We do this so
      when we get an EEH event and everything gets reset that we can restore
      them.
      
      Unfortunately we save this state before we've enabled the MMIO space
      on the bridges. Hence if we have to reset the bridge when we come back
      MMIO is not enabled and we end up taking an PE freeze when the driver
      starts accessing again.
      
      This patch forces the memory/MMIO and bus mastering on when restoring
      bridges on EEH. Ideally we'd do this correctly by saving the
      configuration space writes later, but that will have to come later in
      a larger EEH rewrite. For now we have this simple fix.
      
      The original bug can be triggered on a boston machine by doing:
        echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound
      On boston, this PHB has a PCIe switch on it.  Without this patch,
      you'll see two EEH events, 1 expected and 1 the failure we are fixing
      here. The second EEH event causes the anything under the PHB to
      disappear (i.e. the i40e eth).
      
      With this patch, only 1 EEH event occurs and devices properly recover.
      
      Fixes: 652defed ("powerpc/eeh: Check PCIe link after reset")
      Cc: stable@vger.kernel.org # v3.11+
      Reported-by: NPridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Acked-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      13a83eac
  7. 18 4月, 2018 1 次提交
  8. 17 4月, 2018 1 次提交
  9. 16 4月, 2018 1 次提交
    • M
      powerpc/lib: Fix off-by-one in alternate feature patching · b8858581
      Michael Ellerman 提交于
      When we patch an alternate feature section, we have to adjust any
      relative branches that branch out of the alternate section.
      
      But currently we have a bug if we have a branch that points to past
      the last instruction of the alternate section, eg:
      
        FTR_SECTION_ELSE
        1:     b       2f
               or      6,6,6
        2:
        ALT_FTR_SECTION_END(...)
               nop
      
      This will result in a relative branch at 1 with a target that equals
      the end of the alternate section.
      
      That branch does not need adjusting when it's moved to the non-else
      location. Currently we do adjust it, resulting in a branch that goes
      off into the link-time location of the else section, which is junk.
      
      The fix is to not patch branches that have a target == end of the
      alternate section.
      
      Fixes: d20fe50a ("KVM: PPC: Book3S HV: Branch inside feature section")
      Fixes: 9b1a735d ("powerpc: Add logic to patch alternative feature sections")
      Cc: stable@vger.kernel.org # v2.6.27+
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b8858581
  10. 14 4月, 2018 3 次提交
  11. 13 4月, 2018 1 次提交
    • M
      powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features · 81b654c2
      Michael Ellerman 提交于
      The cpu_has_feature() mechanism has an optimisation where at build
      time we construct a mask of the CPU feature bits that will always be
      true for the given .config, based on the platform/bitness/etc. that we
      are building for.
      
      That is incompatible with DT CPU features, where the set of CPU
      features is dependent on feature flags that are given to us by
      firmware.
      
      The result is that some feature bits can not be *disabled* by DT CPU
      features. Or more accurately, they can be disabled but they will still
      appear in the ALWAYS mask, meaning cpu_has_feature() will always
      return true for them.
      
      In the past this hasn't really been a problem because on Book3S
      64 (where we support DT CPU features), the set of ALWAYS bits has been
      very small. That was because we always built for POWER4 and later,
      meaning the set of common bits was small.
      
      The only bit that could be cleared by DT CPU features that was also in
      the ALWAYS mask was CPU_FTR_NODSISRALIGN, and that was only used in
      the alignment handler to create a fake DSISR. That code was itself
      deleted in 31bfdb03 ("powerpc: Use instruction emulation
      infrastructure to handle alignment faults") (Sep 2017).
      
      However the set of ALWAYS features changed with the recent commit
      db5ae1c1 ("powerpc/64s: Refine feature sets for little endian
      builds") which restricted the set of feature flags when building
      little endian to Power7 or later. That caused the ALWAYS mask to
      become much larger for little endian builds.
      
      The result is that the following feature bits can currently not
      be *disabled* by DT CPU features:
      
        CPU_FTR_REAL_LE, CPU_FTR_MMCRA, CPU_FTR_CTRL, CPU_FTR_SMT,
        CPU_FTR_PURR, CPU_FTR_SPURR, CPU_FTR_DSCR, CPU_FTR_PKEY,
        CPU_FTR_VMX_COPY, CPU_FTR_CFAR, CPU_FTR_HAS_PPR.
      
      To fix it we need to mask the set of ALWAYS features with the base set
      of DT CPU features, ie. the features that are always enabled by DT CPU
      features. That way there are no bits in the ALWAYS mask that are not
      also always set by DT CPU features.
      
      Fixes: db5ae1c1 ("powerpc/64s: Refine feature sets for little endian builds")
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      81b654c2
  12. 12 4月, 2018 3 次提交
    • M
      powerpc/mm/radix: Fix checkstops caused by invalid tlbiel · 2675c13b
      Michael Ellerman 提交于
      In tlbiel_radix_set_isa300() we use the PPC_TLBIEL() macro to
      construct tlbiel instructions. The instruction takes 5 fields, two of
      which are registers, and the others are constants. But because it's
      constructed with inline asm the compiler doesn't know that.
      
      We got the constraint wrong on the 'r' field, using "r" tells the
      compiler to put the value in a register. The value we then get in the
      macro is the *register number*, not the value of the field.
      
      That means when we mask the register number with 0x1 we get 0 or 1
      depending on which register the compiler happens to put the constant
      in, eg:
      
        li      r10,1
        tlbiel  r8,r9,2,0,0
      
        li      r7,1
        tlbiel  r10,r6,0,0,1
      
      If we're unlucky we might generate an invalid instruction form, for
      example RIC=0, PRS=1 and R=0, tlbiel r8,r7,0,1,0, this has been
      observed to cause machine checks:
      
        Oops: Machine check, sig: 7 [#1]
        CPU: 24 PID: 0 Comm: swapper
        NIP:  00000000000385f4 LR: 000000000100ed00 CTR: 000000000000007f
        REGS: c00000000110bb40 TRAP: 0200
        MSR:  9000000000201003 <SF,HV,ME,RI,LE>  CR: 48002222  XER: 20040000
        CFAR: 00000000000385d0 DAR: 0000000000001c00 DSISR: 00000200 SOFTE: 1
      
      If the machine check happens early in boot while we have MSR_ME=0 it
      will escalate into a checkstop and kill the box entirely.
      
      To fix it we could change the inline asm constraint to "i" which
      tells the compiler the value is a constant. But a better fix is to just
      pass a literal 1 into the macro, which bypasses any problems with inline
      asm constraints.
      
      Fixes: d4748276 ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      2675c13b
    • K
      exec: pass stack rlimit into mm layout functions · 8f2af155
      Kees Cook 提交于
      Patch series "exec: Pin stack limit during exec".
      
      Attempts to solve problems with the stack limit changing during exec
      continue to be frustrated[1][2].  In addition to the specific issues
      around the Stack Clash family of flaws, Andy Lutomirski pointed out[3]
      other places during exec where the stack limit is used and is assumed to
      be unchanging.  Given the many places it gets used and the fact that it
      can be manipulated/raced via setrlimit() and prlimit(), I think the only
      way to handle this is to move away from the "current" view of the stack
      limit and instead attach it to the bprm, and plumb this down into the
      functions that need to know the stack limits.  This series implements
      the approach.
      
      [1] 04e35f44 ("exec: avoid RLIMIT_STACK races with prlimit()")
      [2] 779f4e1c ("Revert "exec: avoid RLIMIT_STACK races with prlimit()"")
      [3] to security@kernel.org, "Subject: existing rlimit races?"
      
      This patch (of 3):
      
      Since it is possible that the stack rlimit can change externally during
      exec (either via another thread calling setrlimit() or another process
      calling prlimit()), provide a way to pass the rlimit down into the
      per-architecture mm layout functions so that the rlimit can stay in the
      bprm structure instead of sitting in the signal structure until exec is
      finalized.
      
      Link: http://lkml.kernel.org/r/1518638796-20819-2-git-send-email-keescook@chromium.orgSigned-off-by: NKees Cook <keescook@chromium.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Cc: Brad Spengler <spender@grsecurity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f2af155
    • M
      mm, migrate: remove reason argument from new_page_t · 666feb21
      Michal Hocko 提交于
      No allocation callback is using this argument anymore.  new_page_node
      used to use this parameter to convey node_id resp.  migration error up
      to move_pages code (do_move_page_to_node_array).  The error status never
      made it into the final status field and we have a better way to
      communicate node id to the status field now.  All other allocation
      callbacks simply ignored the argument so we can drop it finally.
      
      [mhocko@suse.com: fix migration callback]
        Link: http://lkml.kernel.org/r/20180105085259.GH2801@dhcp22.suse.cz
      [akpm@linux-foundation.org: fix alloc_misplaced_dst_page()]
      [mhocko@kernel.org: fix build]
        Link: http://lkml.kernel.org/r/20180103091134.GB11319@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/20180103082555.14592-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NZi Yan <zi.yan@cs.rutgers.edu>
      Cc: Andrea Reale <ar@linux.vnet.ibm.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      666feb21
  13. 11 4月, 2018 3 次提交
    • N
      KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode · 19ce7909
      Nicholas Piggin 提交于
      This crashes with a "Bad real address for load" attempting to load
      from the vmalloc region in realmode (faulting address is in DAR).
      
        Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1]
        LE SMP NR_CPUS=2048 NUMA PowerNV
        CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted 4.16.0-01530-g43d1859f0994
        NIP:  c0000000000155ac LR: c0000000000c2430 CTR: c000000000015580
        REGS: c000000fff76dd80 TRAP: 0200   Not tainted  (4.16.0-01530-g43d1859f0994)
        MSR:  9000000000201003 <SF,HV,ME,RI,LE>  CR: 48082222  XER: 00000000
        CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: 00000040 SOFTE: 3
        NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0
        LR [c0000000000c2430] do_tlbies+0x230/0x2f0
      
      I suspect the reason is the per-cpu data is not in the linear chunk.
      This could be restored if that was able to be fixed, but for now,
      just remove the tracepoints.
      
      Fixes: 0428491c ("powerpc/mm: Trace tlbie(l) instructions")
      Cc: stable@vger.kernel.org # v4.13+
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      19ce7909
    • A
      powerpc/8xx: Fix build with hugetlbfs enabled · 032900e6
      Aneesh Kumar K.V 提交于
      8xx uses the slice code when hugetlbfs is enabled. We missed a header
      include on 8xx which resulted in the below build failure:
      
        config: mpc885_ads_defconfig + CONFIG_HUGETLBFS
      
        arch/powerpc/mm/slice.c: In function 'slice_get_unmapped_area':
        arch/powerpc/mm/slice.c:655:2: error: implicit declaration of function 'need_extra_context'
        arch/powerpc/mm/slice.c:656:3: error: implicit declaration of function 'alloc_extended_context'
      
      on PPC64 the mmu_context.h was included via linux/pkeys.h
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      032900e6
    • N
      powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops · 3b807033
      Nicholas Piggin 提交于
      The OPAL NVRAM driver does not sleep in case it gets OPAL_BUSY or
      OPAL_BUSY_EVENT from firmware, which causes large scheduling
      latencies, and various lockup errors to trigger (again, BMC reboot
      can cause it).
      
      Fix this by converting it to the standard form OPAL_BUSY loop that
      sleeps.
      
      Fixes: 628daa8d ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks")
      Depends-on: 34dd25de ("powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops")
      Cc: stable@vger.kernel.org # v3.2+
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3b807033
  14. 10 4月, 2018 3 次提交
    • N
      powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops · 34dd25de
      Nicholas Piggin 提交于
      This is the start of an effort to tidy up and standardise all the
      delays. Existing loops have a range of delay/sleep periods from 1ms
      to 20ms, and some have no delay. They all loop forever except rtc,
      which times out after 10 retries, and that uses 10ms delays. So use
      10ms as our standard delay. The OPAL maintainer agrees 10ms is a
      reasonable starting point.
      
      The idea is to use the same recipe everywhere, once this is proven to
      work then it will be documented as an OPAL API standard. Then both
      firmware and OS can agree, and if a particular call needs something
      else, then that can be documented with reasoning.
      
      This is not the end-all of this effort, it's just a relatively easy
      change that fixes some existing high latency delays. There should be
      provision for standardising timeouts and/or interruptible loops where
      possible, so non-fatal firmware errors don't cause hangs.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      34dd25de
    • A
      powerpc/fscr: Enable interrupts earlier before calling get_user() · 709b973c
      Anshuman Khandual 提交于
      The function get_user() can sleep while trying to fetch instruction
      from user address space and causes the following warning from the
      scheduler.
      
      BUG: sleeping function called from invalid context
      
      Though interrupts get enabled back but it happens bit later after
      get_user() is called. This change moves enabling these interrupts
      earlier covering the function get_user(). While at this, lets check
      for kernel mode and crash as this interrupt should not have been
      triggered from the kernel context.
      Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      709b973c
    • M
      powerpc/64s: Fix section mismatch warnings from setup_rfi_flush() · 501a78cb
      Michael Ellerman 提交于
      The recent LPM changes to setup_rfi_flush() are causing some section
      mismatch warnings because we removed the __init annotation on
      setup_rfi_flush():
      
        The function setup_rfi_flush() references
        the function __init ppc64_bolted_size().
        the function __init memblock_alloc_base().
      
      The references are actually in init_fallback_flush(), but that is
      inlined into setup_rfi_flush().
      
      These references are safe because:
       - only pseries calls setup_rfi_flush() at runtime
       - pseries always passes L1D_FLUSH_FALLBACK at boot
       - so the fallback flush area will always be allocated
       - so the check in init_fallback_flush() will always return early:
         /* Only allocate the fallback flush area once (at boot time). */
         if (l1d_flush_fallback_area)
         	return;
      
       - and therefore we won't actually call the freed init routines.
      
      We should rework the code to make it safer by default rather than
      relying on the above, but for now as a quick-fix just add a __ref
      annotation to squash the warning.
      
      Fixes: abf110f3 ("powerpc/rfi-flush: Make it possible to call setup_rfi_flush() again")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      501a78cb
  15. 09 4月, 2018 1 次提交
    • M
      powerpc/modules: Fix crashes by adding CONFIG_RELOCATABLE to vermagic · 73aca179
      Michael Ellerman 提交于
      If you build the kernel with CONFIG_RELOCATABLE=n, then install the
      modules, rebuild the kernel with CONFIG_RELOCATABLE=y and leave the
      old modules installed, we crash something like:
      
        Unable to handle kernel paging request for data at address 0xd000000018d66cef
        Faulting instruction address: 0xc0000000021ddd08
        Oops: Kernel access of bad area, sig: 11 [#1]
        Modules linked in: x_tables autofs4
        CPU: 2 PID: 1 Comm: systemd Not tainted 4.16.0-rc6-gcc_ubuntu_le-g99fec39e #1
        ...
        NIP check_version.isra.22+0x118/0x170
        Call Trace:
          __ksymtab_xt_unregister_table+0x58/0xfffffffffffffcb8 [x_tables] (unreliable)
          resolve_symbol+0xb4/0x150
          load_module+0x10e8/0x29a0
          SyS_finit_module+0x110/0x140
          system_call+0x58/0x6c
      
      This happens because since commit 71810db2 ("modversions: treat
      symbol CRCs as 32 bit quantities"), a relocatable kernel encodes and
      handles symbol CRCs differently from a non-relocatable kernel.
      
      Although it's possible we could try and detect this situation and
      handle it, it's much more robust to simply make the state of
      CONFIG_RELOCATABLE part of the module vermagic.
      
      Fixes: 71810db2 ("modversions: treat symbol CRCs as 32 bit quantities")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      73aca179
  16. 07 4月, 2018 1 次提交
  17. 06 4月, 2018 3 次提交
  18. 05 4月, 2018 5 次提交
    • N
      powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep · c1b25a17
      Nicholas Piggin 提交于
      POWER8 restores AMOR when waking from deep sleep, but POWER9 does not,
      because it does not go through the subcore restore.
      
      Have POWER9 restore it in core restore.
      
      Fixes: ee97b6b9 ("powerpc/mm/radix: Setup AMOR in HV mode to allow key 0")
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c1b25a17
    • N
      powerpc/64s: Fix POWER9 DD2.2 and above in cputable features · 3a52f601
      Nicholas Piggin 提交于
      The CPU_FTR_POWER9_DD2_1 flag is intended to be set for DD2.1 and
      above (which is what the dt_cpu_ftrs setup does). Fix cputable for
      DD2.2 to match.
      
      This came about due to patches b5af4f27 ("powerpc: Add CPU feature
      bits for TM bug workarounds on POWER9 v2.2"), and 9e9626ed
      ("powerpc/64s: Fix POWER9 DD2.2 and above in DT CPU features") being
      in-flight at once. The latter patch fixed dt_cpu_ftrs like this one
      does. The former changed cputable to match dt_cpu_ftrs.
      
      Fixes: b5af4f27 ("powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2")
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3a52f601
    • N
      powerpc/64s: Fix pkey support in dt_cpu_ftrs, add CPU_FTR_PKEY bit · c130153e
      Nicholas Piggin 提交于
      The pkey code added a CPU_FTR_PKEY bit, but did not add it to the
      dt_cpu_ftrs feature set. Although capability is supported by all
      processors in the base dt_cpu_ftrs set for 64s, it's a significant
      and sufficiently well defined feature to make it optional. So add
      it as a quirk for now, which can be versioned out then controlled
      by the firmware (once dt_cpu_ftrs gains versioning support).
      
      Fixes: cf43d3b2 ("powerpc: Enable pkey subsystem")
      Cc: stable@vger.kernel.org # v4.16+
      Cc: Ram Pai <linuxram@us.ibm.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c130153e
    • N
      powerpc/64s: Fix dt_cpu_ftrs to have restore_cpu clear unwanted LPCR bits · a57ac411
      Nicholas Piggin 提交于
      Presently the dt_cpu_ftrs restore_cpu will only add bits to the LPCR
      for secondaries, but some bits must be removed (e.g., UPRT for HPT).
      Not clearing these bits on secondaries causes checkstops when booting
      with disable_radix.
      
      restore_cpu can not just set LPCR, because it is also called by the
      idle wakeup code which relies on opal_slw_set_reg to restore the value
      of LPCR, at least on P8 which does not save LPCR to stack in the idle
      code.
      
      Fix this by including a mask of bits to clear from LPCR as well, which
      is used by restore_cpu.
      
      This is a little messy now, but it's a minimal fix that can be
      backported.  Longer term, the idle SPR save/restore code can be
      reworked to completely avoid calls to restore_cpu, then restore_cpu
      would be able to unconditionally set LPCR to match boot processor
      environment.
      
      Fixes: 5a61ef74 ("powerpc/64s: Support new device tree binding for discovering CPU features")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a57ac411
    • M
      Revert "powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead" · a67cc594
      Michael Ellerman 提交于
      As described in that commit:
      
        When stop is executed with EC=ESL=0, it appears to execute like a
        normal instruction (resuming from NIP when woken by interrupt). So
        all the save/restore handling can be avoided completely.
      
      This is true, except in the case of an NMI interrupt (sreset or
      machine check) interrupting the instruction. In that case, the NMI
      gets an "interrupt occurred while the processor was in power-saving
      mode" indication. The power-save wakeup code uses that bit to decide
      whether to restore some registers (e.g., LR). Because these are no
      longer saved, this causes random register corruption.
      
      It may be possible to restore this optimisation by detecting the case
      of no register loss on the wakeup side, and avoid restoring in that
      case, but that's not a minor fix because the wakeup code itself uses
      some registers that would be live (e.g., LR).
      
      Fixes: b9ee31e1 ("powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead")
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a67cc594