1. 01 12月, 2019 16 次提交
    • M
      powerpc/book3s64: Fix link stack flush on context switch · 0a60d4bd
      Michael Ellerman 提交于
      commit 39e72bf96f5847ba87cc5bd7a3ce0fed813dc9ad upstream.
      
      In commit ee13cb24 ("powerpc/64s: Add support for software count
      cache flush"), I added support for software to flush the count
      cache (indirect branch cache) on context switch if firmware told us
      that was the required mitigation for Spectre v2.
      
      As part of that code we also added a software flush of the link
      stack (return address stack), which protects against Spectre-RSB
      between user processes.
      
      That is all correct for CPUs that activate that mitigation, which is
      currently Power9 Nimbus DD2.3.
      
      What I got wrong is that on older CPUs, where firmware has disabled
      the count cache, we also need to flush the link stack on context
      switch.
      
      To fix it we create a new feature bit which is not set by firmware,
      which tells us we need to flush the link stack. We set that when
      firmware tells us that either of the existing Spectre v2 mitigations
      are enabled.
      
      Then we adjust the patching code so that if we see that feature bit we
      enable the link stack flush. If we're also told to flush the count
      cache in software then we fall through and do that also.
      
      On the older CPUs we don't need to do do the software count cache
      flush, firmware has disabled it, so in that case we patch in an early
      return after the link stack flush.
      
      The naming of some of the functions is awkward after this patch,
      because they're called "count cache" but they also do link stack. But
      we'll fix that up in a later commit to ease backporting.
      
      This is the fix for CVE-2019-18660.
      Reported-by: NAnthony Steinhauser <asteinhauser@google.com>
      Fixes: ee13cb24 ("powerpc/64s: Add support for software count cache flush")
      Cc: stable@vger.kernel.org # v4.4+
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a60d4bd
    • C
      powerpc/64s: support nospectre_v2 cmdline option · 19d98b4d
      Christopher M. Riedl 提交于
      commit d8f0e0b073e1ec52a05f0c2a56318b47387d2f10 upstream.
      
      Add support for disabling the kernel implemented spectre v2 mitigation
      (count cache flush on context switch) via the nospectre_v2 and
      mitigations=off cmdline options.
      Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NChristopher M. Riedl <cmr@informatik.wtf>
      Reviewed-by: NAndrew Donnellan <ajd@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190524024647.381-1-cmr@informatik.wtfSigned-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      19d98b4d
    • D
      powerpc/powernv: hold device_hotplug_lock when calling device_online() · 3081ae5e
      David Hildenbrand 提交于
      [ Upstream commit cec1680591d6d5b10ecc10f370210089416e98af ]
      
      device_online() should be called with device_hotplug_lock() held.
      
      Link: http://lkml.kernel.org/r/20180925091457.28651-5-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NPavel Tatashin <pavel.tatashin@microsoft.com>
      Reviewed-by: NRashmica Gupta <rashmica.g@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Rashmica Gupta <rashmica.g@gmail.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: John Allen <jallen@linux.vnet.ibm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3081ae5e
    • D
      mm/memory_hotplug: make add_memory() take the device_hotplug_lock · 02735d59
      David Hildenbrand 提交于
      [ Upstream commit 8df1d0e4a265f25dc1e7e7624ccdbcb4a6630c89 ]
      
      add_memory() currently does not take the device_hotplug_lock, however
      is aleady called under the lock from
      	arch/powerpc/platforms/pseries/hotplug-memory.c
      	drivers/acpi/acpi_memhotplug.c
      to synchronize against CPU hot-remove and similar.
      
      In general, we should hold the device_hotplug_lock when adding memory to
      synchronize against online/offline request (e.g.  from user space) - which
      already resulted in lock inversions due to device_lock() and
      mem_hotplug_lock - see 30467e0b ("mm, hotplug: fix concurrent memory
      hot-add deadlock").  add_memory()/add_memory_resource() will create memory
      block devices, so this really feels like the right thing to do.
      
      Holding the device_hotplug_lock makes sure that a memory block device
      can really only be accessed (e.g. via .online/.state) from user space,
      once the memory has been fully added to the system.
      
      The lock is not held yet in
      	drivers/xen/balloon.c
      	arch/powerpc/platforms/powernv/memtrace.c
      	drivers/s390/char/sclp_cmd.c
      	drivers/hv/hv_balloon.c
      So, let's either use the locked variants or take the lock.
      
      Don't export add_memory_resource(), as it once was exported to be used by
      XEN, which is never built as a module.  If somebody requires it, we also
      have to export a locked variant (as device_hotplug_lock is never
      exported).
      
      Link: http://lkml.kernel.org/r/20180925091457.28651-3-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NPavel Tatashin <pavel.tatashin@microsoft.com>
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NRashmica Gupta <rashmica.g@gmail.com>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: John Allen <jallen@linux.vnet.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      02735d59
    • J
      powerpc/xmon: Relax frame size for clang · 57aab8f0
      Joel Stanley 提交于
      [ Upstream commit 9c87156cce5a63735d1218f0096a65c50a7a32aa ]
      
      When building with clang (8 trunk, 7.0 release) the frame size limit is
      hit:
      
       arch/powerpc/xmon/xmon.c:452:12: warning: stack frame size of 2576
       bytes in function 'xmon_core' [-Wframe-larger-than=]
      
      Some investigation by Naveen indicates this is due to clang saving the
      addresses to printf format strings on the stack.
      
      While this issue is investigated, bump up the frame size limit for xmon
      when building with clang.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/252Signed-off-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      57aab8f0
    • F
      powerpc/process: Fix flush_all_to_thread for SPE · 4e4cad43
      Felipe Rechia 提交于
      [ Upstream commit e901378578c62202594cba0f6c076f3df365ec91 ]
      
      Fix a bug introduced by the creation of flush_all_to_thread() for
      processors that have SPE (Signal Processing Engine) and use it to
      compute floating-point operations.
      
      >From userspace perspective, the problem was seen in attempts of
      computing floating-point operations which should generate exceptions.
      For example:
      
        fork();
        float x = 0.0 / 0.0;
        isnan(x);           // forked process returns False (should be True)
      
      The operation above also should always cause the SPEFSCR FINV bit to
      be set. However, the SPE floating-point exceptions were turned off
      after a fork().
      
      Kernel versions prior to the bug used flush_spe_to_thread(), which
      first saves SPEFSCR register values in tsk->thread and then calls
      giveup_spe(tsk).
      
      After commit 579e633e, the save_all() function was called first
      to giveup_spe(), and then the SPEFSCR register values were saved in
      tsk->thread. This would save the SPEFSCR register values after
      disabling SPE for that thread, causing the bug described above.
      
      Fixes 579e633e ("powerpc: create flush_all_to_thread()")
      Signed-off-by: NFelipe Rechia <felipe.rechia@datacom.com.br>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4e4cad43
    • N
      powerpc/64s/radix: Fix radix__flush_tlb_collapsed_pmd double flushing pmd · 3173e226
      Nicholas Piggin 提交于
      [ Upstream commit dd76ff5af35350fd6d5bb5b069e73b6017f66893 ]
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3173e226
    • M
      powerpc/mm/radix: Fix small page at boundary when splitting · b43c5287
      Michael Ellerman 提交于
      [ Upstream commit 81d1b54dec95209ab5e5be2cf37182885f998753 ]
      
      When we have CONFIG_STRICT_KERNEL_RWX enabled, we want to split the
      linear mapping at the text/data boundary so we can map the kernel
      text read only.
      
      Currently we always use a small page at the text/data boundary, even
      when that's not necessary:
      
        Mapped 0x0000000000000000-0x0000000000e00000 with 2.00 MiB pages
        Mapped 0x0000000000e00000-0x0000000001000000 with 64.0 KiB pages
        Mapped 0x0000000001000000-0x0000000040000000 with 2.00 MiB pages
      
      This is because the check that the mapping crosses the __init_begin
      boundary is too strict, it also returns true when we map exactly up to
      the boundary.
      
      So fix it to check that the mapping would actually map past
      __init_begin, and with that we see:
      
        Mapped 0x0000000000000000-0x0000000040000000 with 2.00 MiB pages
        Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b43c5287
    • M
      powerpc/mm/radix: Fix overuse of small pages in splitting logic · b499fa07
      Michael Ellerman 提交于
      [ Upstream commit 3b5657ed5b4e27ccf593a41ff3c5aa27dae8df18 ]
      
      When we have CONFIG_STRICT_KERNEL_RWX enabled, we want to split the
      linear mapping at the text/data boundary so we can map the kernel text
      read only.
      
      But the current logic uses small pages for the entire text section,
      regardless of whether a larger page size would fit. eg. with the
      boundary at 16M we could use 2M pages, but instead we use 64K pages up
      to the 16M boundary:
      
        Mapped 0x0000000000000000-0x0000000001000000 with 64.0 KiB pages
        Mapped 0x0000000001000000-0x0000000040000000 with 2.00 MiB pages
        Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
      
      This is because the test is checking if addr is < __init_begin
      and addr + mapping_size is >= _stext. But that is true for all pages
      between _stext and __init_begin.
      
      Instead what we want to check is if we are crossing the text/data
      boundary, which is at __init_begin. With that fixed we see:
      
        Mapped 0x0000000000000000-0x0000000000e00000 with 2.00 MiB pages
        Mapped 0x0000000000e00000-0x0000000001000000 with 64.0 KiB pages
        Mapped 0x0000000001000000-0x0000000040000000 with 2.00 MiB pages
        Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
      
      ie. we're correctly using 2MB pages below __init_begin, but we still
      drop down to 64K pages unnecessarily at the boundary.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b499fa07
    • M
      powerpc/mm/radix: Fix off-by-one in split mapping logic · 434551e9
      Michael Ellerman 提交于
      [ Upstream commit 5c6499b7041b43807dfaeda28aa87fc0e62558f7 ]
      
      When we have CONFIG_STRICT_KERNEL_RWX enabled, we try to split the
      kernel linear (1:1) mapping so that the kernel text is in a separate
      page to kernel data, so we can mark the former read-only.
      
      We could achieve that just by always using 64K pages for the linear
      mapping, but we try to be smarter. Instead we use huge pages when
      possible, and only switch to smaller pages when necessary.
      
      However we have an off-by-one bug in that logic, which causes us to
      calculate the wrong boundary between text and data.
      
      For example with the end of the kernel text at 16M we see:
      
        radix-mmu: Mapped 0x0000000000000000-0x0000000001200000 with 64.0 KiB pages
        radix-mmu: Mapped 0x0000000001200000-0x0000000040000000 with 2.00 MiB pages
        radix-mmu: Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
      
      ie. we mapped from 0 to 18M with 64K pages, even though the boundary
      between text and data is at 16M.
      
      With the fix we see we're correctly hitting the 16M boundary:
      
        radix-mmu: Mapped 0x0000000000000000-0x0000000001000000 with 64.0 KiB pages
        radix-mmu: Mapped 0x0000000001000000-0x0000000040000000 with 2.00 MiB pages
        radix-mmu: Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      434551e9
    • A
      powerpc/pseries: Export raw per-CPU VPA data via debugfs · ee35e01b
      Aravinda Prasad 提交于
      [ Upstream commit c6c26fb55e8e4b3fc376be5611685990a17de27a ]
      
      This patch exports the raw per-CPU VPA data via debugfs.
      A per-CPU file is created which exports the VPA data of
      that CPU to help debug some of the VPA related issues or
      to analyze the per-CPU VPA related statistics.
      
      v3: Removed offline CPU check.
      
      v2: Included offline CPU check and other review comments.
      Signed-off-by: NAravinda Prasad <aravinda@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ee35e01b
    • S
      powerpc/eeh: Fix use of EEH_PE_KEEP on wrong field · 97aab1a4
      Sam Bobroff 提交于
      [ Upstream commit 473af09b56dc4be68e4af33220ceca6be67aa60d ]
      
      eeh_add_to_parent_pe() sometimes removes the EEH_PE_KEEP flag, but it
      incorrectly removes it from pe->type, instead of pe->state.
      
      However, rather than clearing it from the correct field, remove it.
      Inspection of the code shows that it can't ever have had any effect
      (even if it had been cleared from the correct field), because the
      field is never tested after it is cleared by the statement in
      question.
      
      The clear statement was added by commit 807a827d ("powerpc/eeh:
      Keep PE during hotplug"), but it didn't explain why it was necessary.
      Signed-off-by: NSam Bobroff <sbobroff@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      97aab1a4
    • S
      powerpc/eeh: Fix null deref for devices removed during EEH · bd2a7e53
      Sam Bobroff 提交于
      [ Upstream commit bcbe3730531239abd45ab6c6af4a18078b37dd47 ]
      
      If a device is removed during EEH processing (either by a driver's
      handler or as part of recovery), it can lead to a null dereference
      in eeh_pe_report_edev().
      
      To handle this, skip devices that have been removed.
      Signed-off-by: NSam Bobroff <sbobroff@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      bd2a7e53
    • J
      powerpc/boot: Disable vector instructions · 16e4657a
      Joel Stanley 提交于
      [ Upstream commit e8e132e6885962582784b6fa16a80d07ea739c0f ]
      
      This will avoid auto-vectorisation when building with higher
      optimisation levels.
      
      We don't know if the machine can support VSX and even if it's present
      it's probably not going to be enabled at this point in boot.
      
      These flag were both added prior to GCC 4.6 which is the minimum
      compiler version supported by upstream, thanks to Segher for the
      details.
      Signed-off-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      16e4657a
    • J
      powerpc/boot: Fix opal console in boot wrapper · 5346c840
      Joel Stanley 提交于
      [ Upstream commit 1a855eaccf353f7ed1d51a3d4b3af727ccbd81ca ]
      
      As of commit 10c77dba ("powerpc/boot: Fix build failure in 32-bit
      boot wrapper") the opal code is hidden behind CONFIG_PPC64_BOOT_WRAPPER,
      but the boot wrapper avoids include/linux, so it does not get the normal
      Kconfig flags.
      
      We can drop the guard entirely as in commit f8e8e69c ("powerpc/boot:
      Only build OPAL code when necessary") the makefile only includes opal.c
      in the build if CONFIG_PPC64_BOOT_WRAPPER is set.
      
      Fixes: 10c77dba ("powerpc/boot: Fix build failure in 32-bit boot wrapper")
      Signed-off-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      5346c840
    • D
      powerpc: Fix signedness bug in update_flash_db() · 4505cff2
      Dan Carpenter 提交于
      [ Upstream commit 014704e6f54189a203cc14c7c0bb411b940241bc ]
      
      The "count < sizeof(struct os_area_db)" comparison is type promoted to
      size_t so negative values of "count" are treated as very high values
      and we accidentally return success instead of a negative error code.
      
      This doesn't really change runtime much but it fixes a static checker
      warning.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NGeoff Levand <geoff@infradead.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4505cff2
  2. 24 11月, 2019 8 次提交
  3. 21 11月, 2019 7 次提交
    • R
      libfdt: Ensure INT_MAX is defined in libfdt_env.h · 0729f87b
      Rob Herring 提交于
      [ Upstream commit 53dd9dce6979bc54d64a3a09a2fb20187a025be7 ]
      
      The next update of libfdt has a new dependency on INT_MAX. Update the
      instances of libfdt_env.h in the kernel to either include the necessary
      header with the definition or define it locally.
      
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0729f87b
    • A
      powerpc: Fix duplicate const clang warning in user access code · eb355ccf
      Anton Blanchard 提交于
      [ Upstream commit e00d93ac9a189673028ac125a74b9bc8ae73eebc ]
      
      This re-applies commit b91c1e3e ("powerpc: Fix duplicate const
      clang warning in user access code") (Jun 2015) which was undone in
      commits:
        f2ca8090 ("powerpc/sparse: Constify the address pointer in __get_user_nosleep()") (Feb 2017)
        d466f6c5 ("powerpc/sparse: Constify the address pointer in __get_user_nocheck()") (Feb 2017)
        f84ed59a ("powerpc/sparse: Constify the address pointer in __get_user_check()") (Feb 2017)
      
      We see a large number of duplicate const errors in the user access
      code when building with llvm/clang:
      
        include/linux/pagemap.h:576:8: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier]
              ret = __get_user(c, uaddr);
      
      The problem is we are doing const __typeof__(*(ptr)), which will hit
      the warning if ptr is marked const.
      
      Removing const does not seem to have any effect on GCC code
      generation.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NJoel Stanley <joel@jms.id.au>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      eb355ccf
    • N
      powerpc/pseries: Disable CPU hotplug across migrations · e7b37640
      Nathan Fontenot 提交于
      [ Upstream commit 85a88cabad57d26d826dd94ea34d3a785824d802 ]
      
      When performing partition migrations all present CPUs must be online
      as all present CPUs must make the H_JOIN call as part of the migration
      process. Once all present CPUs make the H_JOIN call, one CPU is returned
      to make the rtas call to perform the migration to the destination system.
      
      During testing of migration and changing the SMT state we have found
      instances where CPUs are offlined, as part of the SMT state change,
      before they make the H_JOIN call. This results in a hung system where
      every CPU is either in H_JOIN or offline.
      
      To prevent this this patch disables CPU hotplug during the migration
      process.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Reviewed-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      e7b37640
    • N
      powerpc/pseries/memory-hotplug: Only update DT once per memory DLPAR request · 9271304c
      Nathan Fontenot 提交于
      [ Upstream commit 063b8b1251fd069f3740339fca56119d218f11ba ]
      
      The updates to powerpc numa and memory hotplug code now use the
      in-kernel LMB array instead of the device tree. This change allows the
      pseries memory DLPAR code to only update the device tree once after
      successfully handling a DLPAR request.
      
      Prior to the in-kernel LMB array, the numa code looked up the affinity
      for memory being added in the device tree, the code now looks this up
      in the LMB array. This change means the memory hotplug code can just
      update the affinity for an LMB in the LMB array instead of updating
      the device tree.
      
      This also provides a savings in kernel memory. When updating the
      device tree old properties are never free'ed since there is no
      usecount on properties. This behavior leads to a new copy of the
      property being allocated every time a LMB is added or removed (i.e. a
      request to add 100 LMBs creates 100 new copies of the property). With
      this update only a single new property is created when a DLPAR request
      completes successfully.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      9271304c
    • N
      powerpc/64s/hash: Fix stab_rr off by one initialization · 0ab2545a
      Nicholas Piggin 提交于
      [ Upstream commit 09b4438db13fa83b6219aee5993711a2aa2a0c64 ]
      
      This causes SLB alloation to start 1 beyond the start of the SLB.
      There is no real problem because after it wraps it stats behaving
      properly, it's just surprisig to see when looking at SLB traces.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0ab2545a
    • B
      powerpc/iommu: Avoid derefence before pointer check · 089b169c
      Breno Leitao 提交于
      [ Upstream commit 984ecdd68de0fa1f63ce205d6c19ef5a7bc67b40 ]
      
      The tbl pointer is being derefenced by IOMMU_PAGE_SIZE prior the check
      if it is not NULL.
      
      Just moving the dereference code to after the check, where there will
      be guarantee that 'tbl' will not be NULL.
      Signed-off-by: NBreno Leitao <leitao@debian.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      089b169c
    • A
      powerpc/vdso: Correct call frame information · e70ccd8a
      Alan Modra 提交于
      [ Upstream commit 56d20861c027498b5a1112b4f9f05b56d906fdda ]
      
      Call Frame Information is used by gdb for back-traces and inserting
      breakpoints on function return for the "finish" command.  This failed
      when inside __kernel_clock_gettime.  More concerning than difficulty
      debugging is that CFI is also used by stack frame unwinding code to
      implement exceptions.  If you have an app that needs to handle
      asynchronous exceptions for some reason, and you are unlucky enough to
      get one inside the VDSO time functions, your app will crash.
      
      What's wrong:  There is control flow in __kernel_clock_gettime that
      reaches label 99 without saving lr in r12.  CFI info however is
      interpreted by the unwinder without reference to control flow: It's a
      simple matter of "Execute all the CFI opcodes up to the current
      address".  That means the unwinder thinks r12 contains the return
      address at label 99.  Disabuse it of that notion by resetting CFI for
      the return address at label 99.
      
      Note that the ".cfi_restore lr" could have gone anywhere from the
      "mtlr r12" a few instructions earlier to the instruction at label 99.
      I put the CFI as late as possible, because in general that's best
      practice (and if possible grouped with other CFI in order to reduce
      the number of CFI opcodes executed when unwinding).  Using r12 as the
      return address is perfectly fine after the "mtlr r12" since r12 on
      that code path still contains the return address.
      
      __get_datapage also has a CFI error.  That function temporarily saves
      lr in r0, and reflects that fact with ".cfi_register lr,r0".  A later
      use of r0 means the CFI at that point isn't correct, as r0 no longer
      contains the return address.  Fix that too.
      Signed-off-by: NAlan Modra <amodra@gmail.com>
      Tested-by: NReza Arbab <arbab@linux.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      e70ccd8a
  4. 10 11月, 2019 1 次提交
  5. 06 11月, 2019 2 次提交
  6. 12 10月, 2019 6 次提交
    • A
      powerpc/book3s64/radix: Rename CPU_FTR_P9_TLBIE_BUG feature flag · d1e4b4cc
      Aneesh Kumar K.V 提交于
      commit 09ce98cacd51fcd0fa0af2f79d1e1d3192f4cbb0 upstream.
      
      Rename the #define to indicate this is related to store vs tlbie
      ordering issue. In the next patch, we will be adding another feature
      flag that is used to handles ERAT flush vs tlbie ordering issue.
      
      Fixes: a5d4b589 ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190924035254.24612-2-aneesh.kumar@linux.ibm.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1e4b4cc
    • G
      powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt() · f5f31a6e
      Gautham R. Shenoy 提交于
      [ Upstream commit c784be435d5dae28d3b03db31753dd7a18733f0c ]
      
      The calls to arch_add_memory()/arch_remove_memory() are always made
      with the read-side cpu_hotplug_lock acquired via memory_hotplug_begin().
      On pSeries, arch_add_memory()/arch_remove_memory() eventually call
      resize_hpt() which in turn calls stop_machine() which acquires the
      read-side cpu_hotplug_lock again, thereby resulting in the recursive
      acquisition of this lock.
      
      In the absence of CONFIG_PROVE_LOCKING, we hadn't observed a system
      lockup during a memory hotplug operation because cpus_read_lock() is a
      per-cpu rwsem read, which, in the fast-path (in the absence of the
      writer, which in our case is a CPU-hotplug operation) simply
      increments the read_count on the semaphore. Thus a recursive read in
      the fast-path doesn't cause any problems.
      
      However, we can hit this problem in practice if there is a concurrent
      CPU-Hotplug operation in progress which is waiting to acquire the
      write-side of the lock. This will cause the second recursive read to
      block until the writer finishes. While the writer is blocked since the
      first read holds the lock. Thus both the reader as well as the writers
      fail to make any progress thereby blocking both CPU-Hotplug as well as
      Memory Hotplug operations.
      
      Memory-Hotplug				CPU-Hotplug
      CPU 0					CPU 1
      ------                                  ------
      
      1. down_read(cpu_hotplug_lock.rw_sem)
         [memory_hotplug_begin]
      					2. down_write(cpu_hotplug_lock.rw_sem)
      					[cpu_up/cpu_down]
      3. down_read(cpu_hotplug_lock.rw_sem)
         [stop_machine()]
      
      Lockdep complains as follows in these code-paths.
      
       swapper/0/1 is trying to acquire lock:
       (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: stop_machine+0x2c/0x60
      
      but task is already holding lock:
      (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock(cpu_hotplug_lock.rw_sem);
         lock(cpu_hotplug_lock.rw_sem);
      
        *** DEADLOCK ***
      
        May be due to missing lock nesting notation
      
       3 locks held by swapper/0/1:
        #0: (____ptrval____) (&dev->mutex){....}, at: __driver_attach+0x12c/0x1b0
        #1: (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50
        #2: (____ptrval____) (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x54/0x1a0
      
      stack backtrace:
       CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc5-58373-gbc99402235f3-dirty #166
       Call Trace:
         dump_stack+0xe8/0x164 (unreliable)
         __lock_acquire+0x1110/0x1c70
         lock_acquire+0x240/0x290
         cpus_read_lock+0x64/0xf0
         stop_machine+0x2c/0x60
         pseries_lpar_resize_hpt+0x19c/0x2c0
         resize_hpt_for_hotplug+0x70/0xd0
         arch_add_memory+0x58/0xfc
         devm_memremap_pages+0x5e8/0x8f0
         pmem_attach_disk+0x764/0x830
         nvdimm_bus_probe+0x118/0x240
         really_probe+0x230/0x4b0
         driver_probe_device+0x16c/0x1e0
         __driver_attach+0x148/0x1b0
         bus_for_each_dev+0x90/0x130
         driver_attach+0x34/0x50
         bus_add_driver+0x1a8/0x360
         driver_register+0x108/0x170
         __nd_driver_register+0xd0/0xf0
         nd_pmem_driver_init+0x34/0x48
         do_one_initcall+0x1e0/0x45c
         kernel_init_freeable+0x540/0x64c
         kernel_init+0x2c/0x160
         ret_from_kernel_thread+0x5c/0x68
      
      Fix this issue by
        1) Requiring all the calls to pseries_lpar_resize_hpt() be made
           with cpu_hotplug_lock held.
      
        2) In pseries_lpar_resize_hpt() invoke stop_machine_cpuslocked()
           as a consequence of 1)
      
        3) To satisfy 1), in hpt_order_set(), call mmu_hash_ops.resize_hpt()
           with cpu_hotplug_lock held.
      
      Fixes: dbcf929c ("powerpc/pseries: Add support for hash table resizing")
      Cc: stable@vger.kernel.org # v4.11+
      Reported-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1557906352-29048-1-git-send-email-ego@linux.vnet.ibm.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      f5f31a6e
    • C
      KVM: PPC: Book3S HV: XIVE: Free escalation interrupts before disabling the VP · 34b13ff6
      Cédric Le Goater 提交于
      [ Upstream commit 237aed48c642328ff0ab19b63423634340224a06 ]
      
      When a vCPU is brought done, the XIVE VP (Virtual Processor) is first
      disabled and then the event notification queues are freed. When freeing
      the queues, we check for possible escalation interrupts and free them
      also.
      
      But when a XIVE VP is disabled, the underlying XIVE ENDs also are
      disabled in OPAL. When an END (Event Notification Descriptor) is
      disabled, its ESB pages (ESn and ESe) are disabled and loads return all
      1s. Which means that any access on the ESB page of the escalation
      interrupt will return invalid values.
      
      When an interrupt is freed, the shutdown handler computes a 'saved_p'
      field from the value returned by a load in xive_do_source_set_mask().
      This value is incorrect for escalation interrupts for the reason
      described above.
      
      This has no impact on Linux/KVM today because we don't make use of it
      but we will introduce in future changes a xive_get_irqchip_state()
      handler. This handler will use the 'saved_p' field to return the state
      of an interrupt and 'saved_p' being incorrect, softlockup will occur.
      
      Fix the vCPU cleanup sequence by first freeing the escalation interrupts
      if any, then disable the XIVE VP and last free the queues.
      
      Fixes: 90c73795afa2 ("KVM: PPC: Book3S HV: Add a new KVM device for the XIVE native exploitation mode")
      Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190806172538.5087-1-clg@kaod.orgSigned-off-by: NSasha Levin <sashal@kernel.org>
      34b13ff6
    • A
      powerpc/book3s64/mm: Don't do tlbie fixup for some hardware revisions · 9124eac4
      Aneesh Kumar K.V 提交于
      commit 677733e296b5c7a37c47da391fc70a43dc40bd67 upstream.
      
      The store ordering vs tlbie issue mentioned in commit
      a5d4b589 ("powerpc/mm: Fixup tlbie vs store ordering issue on
      POWER9") is fixed for Nimbus 2.3 and Cumulus 1.3 revisions. We don't
      need to apply the fixup if we are running on them
      
      We can only do this on PowerNV. On pseries guest with KVM we still
      don't support redoing the feature fixup after migration. So we should
      be enabling all the workarounds needed, because whe can possibly
      migrate between DD 2.3 and DD 2.2
      
      Fixes: a5d4b589 ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190924035254.24612-1-aneesh.kumar@linux.ibm.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9124eac4
    • A
      powerpc/powernv/ioda: Fix race in TCE level allocation · 19c12f12
      Alexey Kardashevskiy 提交于
      commit 56090a3902c80c296e822d11acdb6a101b322c52 upstream.
      
      pnv_tce() returns a pointer to a TCE entry and originally a TCE table
      would be pre-allocated. For the default case of 2GB window the table
      needs only a single level and that is fine. However if more levels are
      requested, it is possible to get a race when 2 threads want a pointer
      to a TCE entry from the same page of TCEs.
      
      This adds cmpxchg to handle the race. Note that once TCE is non-zero,
      it cannot become zero again.
      
      Fixes: a68bd126 ("powerpc/powernv/ioda: Allocate indirect TCE levels on demand")
      CC: stable@vger.kernel.org # v4.19+
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190718051139.74787-2-aik@ozlabs.ruSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      19c12f12
    • A
      powerpc/powernv: Restrict OPAL symbol map to only be readable by root · 032ce7d7
      Andrew Donnellan 提交于
      commit e7de4f7b64c23e503a8c42af98d56f2a7462bd6d upstream.
      
      Currently the OPAL symbol map is globally readable, which seems bad as
      it contains physical addresses.
      
      Restrict it to root.
      
      Fixes: c8742f85 ("powerpc/powernv: Expose OPAL firmware symbol map")
      Cc: stable@vger.kernel.org # v3.19+
      Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NAndrew Donnellan <ajd@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190503075253.22798-1-ajd@linux.ibm.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      032ce7d7