1. 03 4月, 2019 1 次提交
  2. 27 3月, 2019 1 次提交
    • M
      powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038 · b8ea151a
      Michael Ellerman 提交于
      commit b5b4453e7912f056da1ca7572574cada32ecb60c upstream.
      
      Jakub Drnec reported:
        Setting the realtime clock can sometimes make the monotonic clock go
        back by over a hundred years. Decreasing the realtime clock across
        the y2k38 threshold is one reliable way to reproduce. Allegedly this
        can also happen just by running ntpd, I have not managed to
        reproduce that other than booting with rtc at >2038 and then running
        ntp. When this happens, anything with timers (e.g. openjdk) breaks
        rather badly.
      
      And included a test case (slightly edited for brevity):
        #define _POSIX_C_SOURCE 199309L
        #include <stdio.h>
        #include <time.h>
        #include <stdlib.h>
        #include <unistd.h>
      
        long get_time(void) {
          struct timespec tp;
          clock_gettime(CLOCK_MONOTONIC, &tp);
          return tp.tv_sec + tp.tv_nsec / 1000000000;
        }
      
        int main(void) {
          long last = get_time();
          while(1) {
            long now = get_time();
            if (now < last) {
              printf("clock went backwards by %ld seconds!\n", last - now);
            }
            last = now;
            sleep(1);
          }
          return 0;
        }
      
      Which when run concurrently with:
       # date -s 2040-1-1
       # date -s 2037-1-1
      
      Will detect the clock going backward.
      
      The root cause is that wtom_clock_sec in struct vdso_data is only a
      32-bit signed value, even though we set its value to be equal to
      tk->wall_to_monotonic.tv_sec which is 64-bits.
      
      Because the monotonic clock starts at zero when the system boots the
      wall_to_montonic.tv_sec offset is negative for current and future
      dates. Currently on a freshly booted system the offset will be in the
      vicinity of negative 1.5 billion seconds.
      
      However if the wall clock is set past the Y2038 boundary, the offset
      from wall to monotonic becomes less than negative 2^31, and no longer
      fits in 32-bits. When that value is assigned to wtom_clock_sec it is
      truncated and becomes positive, causing the VDSO assembly code to
      calculate CLOCK_MONOTONIC incorrectly.
      
      That causes CLOCK_MONOTONIC to jump ahead by ~4 billion seconds which
      it is not meant to do. Worse, if the time is then set back before the
      Y2038 boundary CLOCK_MONOTONIC will jump backward.
      
      We can fix it simply by storing the full 64-bit offset in the
      vdso_data, and using that in the VDSO assembly code. We also shuffle
      some of the fields in vdso_data to avoid creating a hole.
      
      The original commit that added the CLOCK_MONOTONIC support to the VDSO
      did actually use a 64-bit value for wtom_clock_sec, see commit
      a7f290da ("[PATCH] powerpc: Merge vdso's and add vdso support to
      32 bits kernel") (Nov 2005). However just 3 days later it was
      converted to 32-bits in commit 0c37ec2a ("[PATCH] powerpc: vdso
      fixes (take #2)"), and the bug has existed since then AFAICS.
      
      Fixes: 0c37ec2a ("[PATCH] powerpc: vdso fixes (take #2)")
      Cc: stable@vger.kernel.org # v2.6.15+
      Link: http://lkml.kernel.org/r/HaC.ZfES.62bwlnvAvMP.1STMMj@seznam.czReported-by: NJakub Drnec <jaydee@email.cz>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8ea151a
  3. 24 3月, 2019 5 次提交
    • C
      powerpc/traps: Fix the message printed when stack overflows · d6d004b3
      Christophe Leroy 提交于
      commit 9bf3d3c4e4fd82c7174f4856df372ab2a71005b9 upstream.
      
      Today's message is useless:
      
        [   42.253267] Kernel stack overflow in process (ptrval), r1=c65500b0
      
      This patch fixes it:
      
        [   66.905235] Kernel stack overflow in process sh[356], r1=c65560b0
      
      Fixes: ad67b74d ("printk: hash addresses printed with %p")
      Cc: stable@vger.kernel.org # v4.15+
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      [mpe: Use task_pid_nr()]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6d004b3
    • C
      powerpc/traps: fix recoverability of machine check handling on book3s/32 · 461a52a4
      Christophe Leroy 提交于
      commit 0bbea75c476b77fa7d7811d6be911cc7583e640f upstream.
      
      Looks like book3s/32 doesn't set RI on machine check, so
      checking RI before calling die() will always be fatal
      allthought this is not an issue in most cases.
      
      Fixes: b96672dd ("powerpc: Machine check interrupt is a non-maskable interrupt")
      Fixes: daf00ae71dad ("powerpc/traps: restore recoverability of machine_check interrupts")
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Cc: stable@vger.kernel.org
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      461a52a4
    • M
      powerpc/ptrace: Simplify vr_get/set() to avoid GCC warning · 9d2e929c
      Michael Ellerman 提交于
      commit ca6d5149d2ad0a8d2f9c28cbe379802260a0a5e0 upstream.
      
      GCC 8 warns about the logic in vr_get/set(), which with -Werror breaks
      the build:
      
        In function ‘user_regset_copyin’,
            inlined from ‘vr_set’ at arch/powerpc/kernel/ptrace.c:628:9:
        include/linux/regset.h:295:4: error: ‘memcpy’ offset [-527, -529] is
        out of the bounds [0, 16] of object ‘vrsave’ with type ‘union
        <anonymous>’ [-Werror=array-bounds]
        arch/powerpc/kernel/ptrace.c: In function ‘vr_set’:
        arch/powerpc/kernel/ptrace.c:623:5: note: ‘vrsave’ declared here
           } vrsave;
      
      This has been identified as a regression in GCC, see GCC bug 88273.
      
      However we can avoid the warning and also simplify the logic and make
      it more robust.
      
      Currently we pass -1 as end_pos to user_regset_copyout(). This says
      "copy up to the end of the regset".
      
      The definition of the regset is:
      	[REGSET_VMX] = {
      		.core_note_type = NT_PPC_VMX, .n = 34,
      		.size = sizeof(vector128), .align = sizeof(vector128),
      		.active = vr_active, .get = vr_get, .set = vr_set
      	},
      
      The end is calculated as (n * size), ie. 34 * sizeof(vector128).
      
      In vr_get/set() we pass start_pos as 33 * sizeof(vector128), meaning
      we can copy up to sizeof(vector128) into/out-of vrsave.
      
      The on-stack vrsave is defined as:
        union {
      	  elf_vrreg_t reg;
      	  u32 word;
        } vrsave;
      
      And elf_vrreg_t is:
        typedef __vector128 elf_vrreg_t;
      
      So there is no bug, but we rely on all those sizes lining up,
      otherwise we would have a kernel stack exposure/overwrite on our
      hands.
      
      Rather than relying on that we can pass an explict end_pos based on
      the sizeof(vrsave). The result should be exactly the same but it's
      more obviously not over-reading/writing the stack and it avoids the
      compiler warning.
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Reported-by: NMathieu Malaterre <malat@debian.org>
      Cc: stable@vger.kernel.org
      Tested-by: NMathieu Malaterre <malat@debian.org>
      Tested-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d2e929c
    • M
      powerpc: Fix 32-bit KVM-PR lockup and host crash with MacOS guest · 344996a8
      Mark Cave-Ayland 提交于
      commit fe1ef6bcdb4fca33434256a802a3ed6aacf0bd2f upstream.
      
      Commit 8792468d "powerpc: Add the ability to save FPU without
      giving it up" unexpectedly removed the MSR_FE0 and MSR_FE1 bits from
      the bitmask used to update the MSR of the previous thread in
      __giveup_fpu() causing a KVM-PR MacOS guest to lockup and panic the
      host kernel.
      
      Leaving FE0/1 enabled means unrelated processes might receive FPEs
      when they're not expecting them and crash. In particular if this
      happens to init the host will then panic.
      
      eg (transcribed):
        qemu-system-ppc[837]: unhandled signal 8 at 12cc9ce4 nip 12cc9ce4 lr 12cc9ca4 code 0
        systemd[1]: unhandled signal 8 at 202f02e0 nip 202f02e0 lr 001003d4 code 0
        Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
      
      Reinstate these bits to the MSR bitmask to enable MacOS guests to run
      under 32-bit KVM-PR once again without issue.
      
      Fixes: 8792468d ("powerpc: Add the ability to save FPU without giving it up")
      Cc: stable@vger.kernel.org # v4.6+
      Signed-off-by: NMark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      344996a8
    • C
      powerpc/32: Clear on-stack exception marker upon exception return · 40b97853
      Christophe Leroy 提交于
      commit 9580b71b5a7863c24a9bd18bcd2ad759b86b1eff upstream.
      
      Clear the on-stack STACK_FRAME_REGS_MARKER on exception exit in order
      to avoid confusing stacktrace like the one below.
      
        Call Trace:
        [c0e9dca0] [c01c42a0] print_address_description+0x64/0x2bc (unreliable)
        [c0e9dcd0] [c01c4684] kasan_report+0xfc/0x180
        [c0e9dd10] [c0895130] memchr+0x24/0x74
        [c0e9dd30] [c00a9e38] msg_print_text+0x124/0x574
        [c0e9dde0] [c00ab710] console_unlock+0x114/0x4f8
        [c0e9de40] [c00adc60] vprintk_emit+0x188/0x1c4
        --- interrupt: c0e9df00 at 0x400f330
            LR = init_stack+0x1f00/0x2000
        [c0e9de80] [c00ae3c4] printk+0xa8/0xcc (unreliable)
        [c0e9df20] [c0c27e44] early_irq_init+0x38/0x108
        [c0e9df50] [c0c15434] start_kernel+0x310/0x488
        [c0e9dff0] [00003484] 0x3484
      
      With this patch the trace becomes:
      
        Call Trace:
        [c0e9dca0] [c01c42c0] print_address_description+0x64/0x2bc (unreliable)
        [c0e9dcd0] [c01c46a4] kasan_report+0xfc/0x180
        [c0e9dd10] [c0895150] memchr+0x24/0x74
        [c0e9dd30] [c00a9e58] msg_print_text+0x124/0x574
        [c0e9dde0] [c00ab730] console_unlock+0x114/0x4f8
        [c0e9de40] [c00adc80] vprintk_emit+0x188/0x1c4
        [c0e9de80] [c00ae3e4] printk+0xa8/0xcc
        [c0e9df20] [c0c27e44] early_irq_init+0x38/0x108
        [c0e9df50] [c0c15434] start_kernel+0x310/0x488
        [c0e9dff0] [00003484] 0x3484
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      40b97853
  4. 27 2月, 2019 1 次提交
  5. 13 2月, 2019 2 次提交
    • M
      powerpc/fadump: Do not allow hot-remove memory from fadump reserved area. · fc090081
      Mahesh Salgaonkar 提交于
      [ Upstream commit 0db6896ff6332ba694f1e61b93ae3b2640317633 ]
      
      For fadump to work successfully there should not be any holes in reserved
      memory ranges where kernel has asked firmware to move the content of old
      kernel memory in event of crash. Now that fadump uses CMA for reserved
      area, this memory area is now not protected from hot-remove operations
      unless it is cma allocated. Hence, fadump service can fail to re-register
      after the hot-remove operation, if hot-removed memory belongs to fadump
      reserved region. To avoid this make sure that memory from fadump reserved
      area is not hot-removable if fadump is registered.
      
      However, if user still wants to remove that memory, he can do so by
      manually stopping fadump service before hot-remove operation.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      fc090081
    • M
      powerpc/32: Add .data..Lubsan_data*/.data..Lubsan_type* sections explicitly · 0f9dff37
      Mathieu Malaterre 提交于
      [ Upstream commit beba24ac59133cb36ecd03f9af9ccb11971ee20e ]
      
      When both `CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y` and `CONFIG_UBSAN=y`
      are set, link step typically produce numberous warnings about orphan
      section:
      
        + powerpc-linux-gnu-ld -EB -m elf32ppc -Bstatic --orphan-handling=warn --build-id --gc-sections -X -o .tmp_vmlinux1 -T ./arch/powerpc/kernel/vmlinux.lds --who
        le-archive built-in.a --no-whole-archive --start-group lib/lib.a --end-group
        powerpc-linux-gnu-ld: warning: orphan section `.data..Lubsan_data393' from `init/main.o' being placed in section `.data..Lubsan_data393'.
        powerpc-linux-gnu-ld: warning: orphan section `.data..Lubsan_data394' from `init/main.o' being placed in section `.data..Lubsan_data394'.
        ...
        powerpc-linux-gnu-ld: warning: orphan section `.data..Lubsan_type11' from `init/main.o' being placed in section `.data..Lubsan_type11'.
        powerpc-linux-gnu-ld: warning: orphan section `.data..Lubsan_type12' from `init/main.o' being placed in section `.data..Lubsan_type12'.
        ...
      
      This commit remove those warnings produced at W=1.
      
      Link: https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg135407.htmlSuggested-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMathieu Malaterre <malat@debian.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0f9dff37
  6. 13 1月, 2019 4 次提交
    • B
      powerpc/tm: Set MSR[TS] just prior to recheckpoint · a854ab8f
      Breno Leitao 提交于
      commit e1c3743e1a20647c53b719dbf28b48f45d23f2cd upstream.
      
      On a signal handler return, the user could set a context with MSR[TS] bits
      set, and these bits would be copied to task regs->msr.
      
      At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
      several __get_user() are called and then a recheckpoint is executed.
      
      This is a problem since a page fault (in kernel space) could happen when
      calling __get_user(). If it happens, the process MSR[TS] bits were
      already set, but recheckpoint was not executed, and SPRs are still invalid.
      
      The page fault can cause the current process to be de-scheduled, with
      MSR[TS] active and without tm_recheckpoint() being called.  More
      importantly, without TEXASR[FS] bit set also.
      
      Since TEXASR might not have the FS bit set, and when the process is
      scheduled back, it will try to reclaim, which will be aborted because of
      the CPU is not in the suspended state, and, then, recheckpoint. This
      recheckpoint will restore thread->texasr into TEXASR SPR, which might be
      zero, hitting a BUG_ON().
      
      	kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434!
      	cpu 0xb: Vector: 700 (Program Check) at [c00000041f1576d0]
      	    pc: c000000000054550: restore_gprs+0xb0/0x180
      	    lr: 0000000000000000
      	    sp: c00000041f157950
      	   msr: 8000000100021033
      	  current = 0xc00000041f143000
      	  paca    = 0xc00000000fb86300	 softe: 0	 irq_happened: 0x01
      	    pid   = 1021, comm = kworker/11:1
      	kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434!
      	Linux version 4.9.0-3-powerpc64le (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26)
      	enter ? for help
      	[c00000041f157b30] c00000000001bc3c tm_recheckpoint.part.11+0x6c/0xa0
      	[c00000041f157b70] c00000000001d184 __switch_to+0x1e4/0x4c0
      	[c00000041f157bd0] c00000000082eeb8 __schedule+0x2f8/0x990
      	[c00000041f157cb0] c00000000082f598 schedule+0x48/0xc0
      	[c00000041f157ce0] c0000000000f0d28 worker_thread+0x148/0x610
      	[c00000041f157d80] c0000000000f96b0 kthread+0x120/0x140
      	[c00000041f157e30] c00000000000c0e0 ret_from_kernel_thread+0x5c/0x7c
      
      This patch simply delays the MSR[TS] set, so, if there is any page fault in
      the __get_user() section, it does not have regs->msr[TS] set, since the TM
      structures are still invalid, thus avoiding doing TM operations for
      in-kernel exceptions and possible process reschedule.
      
      With this patch, the MSR[TS] will only be set just before recheckpointing
      and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
      invalid state.
      
      Other than that, if CONFIG_PREEMPT is set, there might be a preemption just
      after setting MSR[TS] and before tm_recheckpoint(), thus, this block must
      be atomic from a preemption perspective, thus, calling
      preempt_disable/enable() on this code.
      
      It is not possible to move tm_recheckpoint to happen earlier, because it is
      required to get the checkpointed registers from userspace, with
      __get_user(), thus, the only way to avoid this undesired behavior is
      delaying the MSR[TS] set.
      
      The 32-bits signal handler seems to be safe this current issue, but, it
      might be exposed to the preemption issue, thus, disabling preemption in
      this chunk of code.
      
      Changes from v2:
       * Run the critical section with preempt_disable.
      
      Fixes: 87b4e539 ("powerpc/tm: Fix return of active 64bit signals")
      Cc: stable@vger.kernel.org (v3.9+)
      Signed-off-by: NBreno Leitao <leitao@debian.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a854ab8f
    • G
      Revert "powerpc/tm: Unset MSR[TS] if not recheckpointing" · 6d9e96f3
      Greg Kroah-Hartman 提交于
      This reverts commit a9935a12 which is
      commit 6f5b9f018f4c7686fd944d920209d1382d320e4e upstream.
      
      It breaks the powerpc build, so drop it from the tree until a fix goes
      upstream.
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Cc: Breno Leitao <leitao@debian.org>
      Cc: Michal Suchánek <msuchanek@suse.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d9e96f3
    • J
      powerpc: Disable -Wbuiltin-requires-header when setjmp is used · 28e1d143
      Joel Stanley 提交于
      commit aea447141c7e7824b81b49acd1bc785506fba46e upstream.
      
      The powerpc kernel uses setjmp which causes a warning when building
      with clang:
      
        In file included from arch/powerpc/xmon/xmon.c:51:
        ./arch/powerpc/include/asm/setjmp.h:15:13: error: declaration of
        built-in function 'setjmp' requires inclusion of the header <setjmp.h>
              [-Werror,-Wbuiltin-requires-header]
        extern long setjmp(long *);
                    ^
        ./arch/powerpc/include/asm/setjmp.h:16:13: error: declaration of
        built-in function 'longjmp' requires inclusion of the header <setjmp.h>
              [-Werror,-Wbuiltin-requires-header]
        extern void longjmp(long *, long);
                    ^
      
      This *is* the header and we're not using the built-in setjump but
      rather the one in arch/powerpc/kernel/misc.S. As the compiler warning
      does not make sense, it for the files where setjmp is used.
      Signed-off-by: NJoel Stanley <joel@jms.id.au>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      [mpe: Move subdir-ccflags in xmon/Makefile to not clobber -Werror]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      28e1d143
    • N
      powerpc: consolidate -mno-sched-epilog into FTRACE flags · 24e26062
      Nicholas Piggin 提交于
      commit 2a056f58fd33ccc6a0261b552b0f17e7fa4a12f3 upstream.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24e26062
  7. 10 1月, 2019 2 次提交
    • B
      powerpc/tm: Unset MSR[TS] if not recheckpointing · a9935a12
      Breno Leitao 提交于
      commit 6f5b9f018f4c7686fd944d920209d1382d320e4e upstream.
      
      There is a TM Bad Thing bug that can be caused when you return from a
      signal context in a suspended transaction but with ucontext MSR[TS] unset.
      
      This forces regs->msr[TS] to be set at syscall entrance (since the CPU
      state is transactional). It also calls treclaim() to flush the transaction
      state, which is done based on the live (mfmsr) MSR state.
      
      Since user context MSR[TS] is not set, then restore_tm_sigcontexts() is not
      called, thus, not executing recheckpoint, keeping the CPU state as not
      transactional. When calling rfid, SRR1 will have MSR[TS] set, but the CPU
      state is non transactional, causing the TM Bad Thing with the following
      stack:
      
      	[   33.862316] Bad kernel stack pointer 3fffd9dce3e0 at c00000000000c47c
      	cpu 0x8: Vector: 700 (Program Check) at [c00000003ff7fd40]
      	    pc: c00000000000c47c: fast_exception_return+0xac/0xb4
      	    lr: 00003fff865f442c
      	    sp: 3fffd9dce3e0
      	   msr: 8000000102a03031
      	  current = 0xc00000041f68b700
      	  paca    = 0xc00000000fb84800   softe: 0        irq_happened: 0x01
      	    pid   = 1721, comm = tm-signal-sigre
      	Linux version 4.9.0-3-powerpc64le (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26)
      	WARNING: exception is not recoverable, can't continue
      
      The same problem happens on 32-bits signal handler, and the fix is very
      similar, if tm_recheckpoint() is not executed, then regs->msr[TS] should be
      zeroed.
      
      This patch also fixes a sparse warning related to lack of indentation when
      CONFIG_PPC_TRANSACTIONAL_MEM is set.
      
      Fixes: 2b0a576d ("powerpc: Add new transactional memory state to the signal context")
      CC: Stable <stable@vger.kernel.org>	# 3.10+
      Signed-off-by: NBreno Leitao <leitao@debian.org>
      Tested-by: NMichal Suchánek <msuchanek@suse.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a9935a12
    • D
      powerpc/fsl: Fix spectre_v2 mitigations reporting · 34fc0919
      Diana Craciun 提交于
      commit 7d8bad99ba5a22892f0cad6881289fdc3875a930 upstream.
      
      Currently for CONFIG_PPC_FSL_BOOK3E the spectre_v2 file is incorrect:
      
        $ cat /sys/devices/system/cpu/vulnerabilities/spectre_v2
        "Mitigation: Software count cache flush"
      
      Which is wrong. Fix it to report vulnerable for now.
      
      Fixes: ee13cb24 ("powerpc/64s: Add support for software count cache flush")
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: NDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34fc0919
  8. 20 12月, 2018 2 次提交
    • B
      powerpc: Look for "stdout-path" when setting up legacy consoles · c97c353e
      Benjamin Herrenschmidt 提交于
      commit bf3d6afbb234156749b640b6c50f714967a85964 upstream.
      
      Commit 78e5dfea ("powerpc: dts: replace 'linux,stdout-path' with
      'stdout-path'") broke the default console on a number of embedded
      PowerPC systems, because it failed to also update the code in
      arch/powerpc/kernel/legacy_serial.c to look for that property in
      addition to the old one.
      
      This fixes it.
      
      Fixes: 78e5dfea ("powerpc: dts: replace 'linux,stdout-path' with 'stdout-path'")
      Cc: stable@vger.kernel.org # v4.17+
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Reviewed-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c97c353e
    • R
      powerpc/msi: Fix NULL pointer access in teardown code · 73732de1
      Radu Rendec 提交于
      commit 78e7b15e17ac175e7eed9e21c6f92d03d3b0a6fa upstream.
      
      The arch_teardown_msi_irqs() function assumes that controller ops
      pointers were already checked in arch_setup_msi_irqs(), but this
      assumption is wrong: arch_teardown_msi_irqs() can be called even when
      arch_setup_msi_irqs() returns an error (-ENOSYS).
      
      This can happen in the following scenario:
        - msi_capability_init() calls pci_msi_setup_msi_irqs()
        - pci_msi_setup_msi_irqs() returns -ENOSYS
        - msi_capability_init() notices the error and calls free_msi_irqs()
        - free_msi_irqs() calls pci_msi_teardown_msi_irqs()
      
      This is easier to see when CONFIG_PCI_MSI_IRQ_DOMAIN is not set and
      pci_msi_setup_msi_irqs() and pci_msi_teardown_msi_irqs() are just
      aliases to arch_setup_msi_irqs() and arch_teardown_msi_irqs().
      
      The call to free_msi_irqs() upon pci_msi_setup_msi_irqs() failure
      seems legit, as it does additional cleanup; e.g.
      list_del(&entry->list) and kfree(entry) inside free_msi_irqs() do
      happen (MSI descriptors are allocated before pci_msi_setup_msi_irqs()
      is called and need to be cleaned up if that fails).
      
      Fixes: 6b2fd7ef ("PCI/MSI/PPC: Remove arch_msi_check_device()")
      Cc: stable@vger.kernel.org # v3.18+
      Signed-off-by: NRadu Rendec <radu.rendec@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      73732de1
  9. 06 12月, 2018 1 次提交
  10. 21 11月, 2018 4 次提交
  11. 14 11月, 2018 3 次提交
  12. 05 10月, 2018 1 次提交
    • M
      powerpc: Don't print kernel instructions in show_user_instructions() · a932ed3b
      Michael Ellerman 提交于
      Recently we implemented show_user_instructions() which dumps the code
      around the NIP when a user space process dies with an unhandled
      signal. This was modelled on the x86 code, and we even went so far as
      to implement the exact same bug, namely that if the user process
      crashed with its NIP pointing into the kernel we will dump kernel text
      to dmesg. eg:
      
        bad-bctr[2996]: segfault (11) at c000000000010000 nip c000000000010000 lr 12d0b0894 code 1
        bad-bctr[2996]: code: fbe10068 7cbe2b78 7c7f1b78 fb610048 38a10028 38810020 fb810050 7f8802a6
        bad-bctr[2996]: code: 3860001c f8010080 48242371 60000000 <7c7b1b79> 4082002c e8010080 eb610048
      
      This was discovered on x86 by Jann Horn and fixed in commit
      342db04a ("x86/dumpstack: Don't dump kernel memory based on usermode RIP").
      
      Fix it by checking the adjusted NIP value (pc) and number of
      instructions against USER_DS, and bail if we fail the check, eg:
      
        bad-bctr[2969]: segfault (11) at c000000000010000 nip c000000000010000 lr 107930894 code 1
        bad-bctr[2969]: Bad NIP, not dumping instructions.
      
      Fixes: 88b0fe17 ("powerpc: Add show_user_instructions()")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a932ed3b
  13. 25 9月, 2018 2 次提交
    • M
      powerpc/tm: Avoid possible userspace r1 corruption on reclaim · 96dc89d5
      Michael Neuling 提交于
      Current we store the userspace r1 to PACATMSCRATCH before finally
      saving it to the thread struct.
      
      In theory an exception could be taken here (like a machine check or
      SLB miss) that could write PACATMSCRATCH and hence corrupt the
      userspace r1. The SLB fault currently doesn't touch PACATMSCRATCH, but
      others do.
      
      We've never actually seen this happen but it's theoretically
      possible. Either way, the code is fragile as it is.
      
      This patch saves r1 to the kernel stack (which can't fault) before we
      turn MSR[RI] back on. PACATMSCRATCH is still used but only with
      MSR[RI] off. We then copy r1 from the kernel stack to the thread
      struct once we have MSR[RI] back on.
      Suggested-by: NBreno Leitao <leitao@debian.org>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      96dc89d5
    • M
      powerpc/tm: Fix userspace r13 corruption · cf13435b
      Michael Neuling 提交于
      When we treclaim we store the userspace checkpointed r13 to a scratch
      SPR and then later save the scratch SPR to the user thread struct.
      
      Unfortunately, this doesn't work as accessing the user thread struct
      can take an SLB fault and the SLB fault handler will write the same
      scratch SPRG that now contains the userspace r13.
      
      To fix this, we store r13 to the kernel stack (which can't fault)
      before we access the user thread struct.
      
      Found by running P8 guest + powervm + disable_1tb_segments + TM. Seen
      as a random userspace segfault with r13 looking like a kernel address.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Reviewed-by: NBreno Leitao <leitao@debian.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cf13435b
  14. 17 9月, 2018 1 次提交
  15. 12 9月, 2018 1 次提交
    • A
      KVM: PPC: Avoid marking DMA-mapped pages dirty in real mode · 425333bf
      Alexey Kardashevskiy 提交于
      At the moment the real mode handler of H_PUT_TCE calls iommu_tce_xchg_rm()
      which in turn reads the old TCE and if it was a valid entry, marks
      the physical page dirty if it was mapped for writing. Since it is in
      real mode, realmode_pfn_to_page() is used instead of pfn_to_page()
      to get the page struct. However SetPageDirty() itself reads the compound
      page head and returns a virtual address for the head page struct and
      setting dirty bit for that kills the system.
      
      This adds additional dirty bit tracking into the MM/IOMMU API for use
      in the real mode. Note that this does not change how VFIO and
      KVM (in virtual mode) set this bit. The KVM (real mode) changes include:
      - use the lowest bit of the cached host phys address to carry
      the dirty bit;
      - mark pages dirty when they are unpinned which happens when
      the preregistered memory is released which always happens in virtual
      mode;
      - add mm_iommu_ua_mark_dirty_rm() helper to set delayed dirty bit;
      - change iommu_tce_xchg_rm() to take the kvm struct for the mm to use
      in the new mm_iommu_ua_mark_dirty_rm() helper;
      - move iommu_tce_xchg_rm() to book3s_64_vio_hv.c (which is the only
      caller anyway) to reduce the real mode KVM and IOMMU knowledge
      across different subsystems.
      
      This removes realmode_pfn_to_page() as it is not used anymore.
      
      While we at it, remove some EXPORT_SYMBOL_GPL() as that code is for
      the real mode only and modules cannot call it anyway.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      425333bf
  16. 21 8月, 2018 1 次提交
    • S
      powerpc/topology: Get topology for shared processors at boot · 2ea62630
      Srikar Dronamraju 提交于
      On a shared LPAR, Phyp will not update the CPU associativity at boot
      time. Just after the boot system does recognize itself as a shared
      LPAR and trigger a request for correct CPU associativity. But by then
      the scheduler would have already created/destroyed its sched domains.
      
      This causes
        - Broken load balance across Nodes causing islands of cores.
        - Performance degradation esp if the system is lightly loaded
        - dmesg to wrongly report all CPUs to be in Node 0.
        - Messages in dmesg saying borken topology.
        - With commit 051f3ca0 ("sched/topology: Introduce NUMA identity
          node sched domain"), can cause rcu stalls at boot up.
      
      The sched_domains_numa_masks table which is used to generate cpumasks
      is only created at boot time just before creating sched domains and
      never updated. Hence, its better to get the topology correct before
      the sched domains are created.
      
      For example on 64 core Power 8 shared LPAR, dmesg reports
      
        Brought up 512 CPUs
        Node 0 CPUs: 0-511
        Node 1 CPUs:
        Node 2 CPUs:
        Node 3 CPUs:
        Node 4 CPUs:
        Node 5 CPUs:
        Node 6 CPUs:
        Node 7 CPUs:
        Node 8 CPUs:
        Node 9 CPUs:
        Node 10 CPUs:
        Node 11 CPUs:
        ...
        BUG: arch topology borken
             the DIE domain not a subset of the NUMA domain
        BUG: arch topology borken
             the DIE domain not a subset of the NUMA domain
      
      numactl/lscpu output will still be correct with cores spreading across
      all nodes:
      
        Socket(s):             64
        NUMA node(s):          12
        Model:                 2.0 (pvr 004d 0200)
        Model name:            POWER8 (architected), altivec supported
        Hypervisor vendor:     pHyp
        Virtualization type:   para
        L1d cache:             64K
        L1i cache:             32K
        NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471
        NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479
        NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487
        NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495
        NUMA node4 CPU(s):     208-215,304-311,400-407,496-503
        NUMA node5 CPU(s):     168-175,264-271,360-367,456-463
        NUMA node6 CPU(s):     128-135,224-231,320-327,416-423
        NUMA node7 CPU(s):     136-143,232-239,328-335,424-431
        NUMA node8 CPU(s):     216-223,312-319,408-415,504-511
        NUMA node9 CPU(s):     144-151,240-247,336-343,432-439
        NUMA node10 CPU(s):    152-159,248-255,344-351,440-447
        NUMA node11 CPU(s):    160-167,256-263,352-359,448-455
      
      Currently on this LPAR, the scheduler detects 2 levels of Numa and
      created numa sched domains for all CPUs, but it finds a single DIE
      domain consisting of all CPUs. Hence it deletes all numa sched
      domains.
      
      To address this, detect the shared processor and update topology soon
      after CPUs are setup so that correct topology is updated just before
      scheduler creates sched domain.
      
      With the fix, dmesg reports:
      
        numa: Node 0 CPUs: 0-7 32-39 64-71 96-103 176-183 272-279 368-375 464-471
        numa: Node 1 CPUs: 8-15 40-47 72-79 104-111 184-191 280-287 376-383 472-479
        numa: Node 2 CPUs: 16-23 48-55 80-87 112-119 192-199 288-295 384-391 480-487
        numa: Node 3 CPUs: 24-31 56-63 88-95 120-127 200-207 296-303 392-399 488-495
        numa: Node 4 CPUs: 208-215 304-311 400-407 496-503
        numa: Node 5 CPUs: 168-175 264-271 360-367 456-463
        numa: Node 6 CPUs: 128-135 224-231 320-327 416-423
        numa: Node 7 CPUs: 136-143 232-239 328-335 424-431
        numa: Node 8 CPUs: 216-223 312-319 408-415 504-511
        numa: Node 9 CPUs: 144-151 240-247 336-343 432-439
        numa: Node 10 CPUs: 152-159 248-255 344-351 440-447
        numa: Node 11 CPUs: 160-167 256-263 352-359 448-455
      
      and lscpu also reports:
      
        Socket(s):             64
        NUMA node(s):          12
        Model:                 2.0 (pvr 004d 0200)
        Model name:            POWER8 (architected), altivec supported
        Hypervisor vendor:     pHyp
        Virtualization type:   para
        L1d cache:             64K
        L1i cache:             32K
        NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471
        NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479
        NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487
        NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495
        NUMA node4 CPU(s):     208-215,304-311,400-407,496-503
        NUMA node5 CPU(s):     168-175,264-271,360-367,456-463
        NUMA node6 CPU(s):     128-135,224-231,320-327,416-423
        NUMA node7 CPU(s):     136-143,232-239,328-335,424-431
        NUMA node8 CPU(s):     216-223,312-319,408-415,504-511
        NUMA node9 CPU(s):     144-151,240-247,336-343,432-439
        NUMA node10 CPU(s):    152-159,248-255,344-351,440-447
        NUMA node11 CPU(s):    160-167,256-263,352-359,448-455
      Reported-by: NManjunatha H R <manjuhr1@in.ibm.com>
      Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      [mpe: Trim / format change log]
      Tested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2ea62630
  17. 20 8月, 2018 2 次提交
    • H
      powerpc/fadump: cleanup crash memory ranges support · a5818313
      Hari Bathini 提交于
      Commit 1bd6a1c4 ("powerpc/fadump: handle crash memory ranges array
      index overflow") changed crash memory ranges to a dynamic array that
      is reallocated on-demand with krealloc(). The relevant header for this
      call was not included. The kernel compiles though. But be cautious and
      add the header anyway.
      
      Also, memory allocation logic in fadump_add_crash_memory() takes care
      of memory allocation for crash memory ranges in all scenarios. Drop
      unnecessary memory allocation in fadump_setup_crash_memory_ranges().
      
      Fixes: 1bd6a1c4 ("powerpc/fadump: handle crash memory ranges array index overflow")
      Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NHari Bathini <hbathini@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a5818313
    • M
      powerpc/traps: Avoid rate limit messages from show unhandled signals · 997dd26c
      Michael Ellerman 提交于
      In the recent commit to add an explicit ratelimit state when showing
      unhandled signals, commit 35a52a10 ("powerpc/traps: Use an
      explicit ratelimit state for show_signal_msg()"), I put the check of
      show_unhandled_signals and the ratelimit state before the call to
      unhandled_signal() so as to avoid unnecessarily calling the latter
      when show_unhandled_signals is false.
      
      However that causes us to check the ratelimit state on every call, so
      if we take a lot of *handled* signals that has the effect of making
      the ratelimit code print warnings that callbacks have been suppressed
      when they haven't.
      
      So rearrange the code so that we check show_unhandled_signals first,
      then call unhandled_signal() and finally check the ratelimit state.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NMurilo Opsfelder Araujo <muriloo@linux.ibm.com>
      997dd26c
  18. 14 8月, 2018 1 次提交
  19. 10 8月, 2018 5 次提交