1. 07 8月, 2019 10 次提交
    • Z
      xen/pv: Fix a boot up hang revealed by int3 self test · 11cb9f87
      Zhenzhong Duan 提交于
      [ Upstream commit b23e5844dfe78a80ba672793187d3f52e4b528d7 ]
      
      Commit 7457c0da024b ("x86/alternatives: Add int3_emulate_call()
      selftest") is used to ensure there is a gap setup in int3 exception stack
      which could be used for inserting call return address.
      
      This gap is missed in XEN PV int3 exception entry path, then below panic
      triggered:
      
      [    0.772876] general protection fault: 0000 [#1] SMP NOPTI
      [    0.772886] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0+ #11
      [    0.772893] RIP: e030:int3_magic+0x0/0x7
      [    0.772905] RSP: 3507:ffffffff82203e98 EFLAGS: 00000246
      [    0.773334] Call Trace:
      [    0.773334]  alternative_instructions+0x3d/0x12e
      [    0.773334]  check_bugs+0x7c9/0x887
      [    0.773334]  ? __get_locked_pte+0x178/0x1f0
      [    0.773334]  start_kernel+0x4ff/0x535
      [    0.773334]  ? set_init_arg+0x55/0x55
      [    0.773334]  xen_start_kernel+0x571/0x57a
      
      For 64bit PV guests, Xen's ABI enters the kernel with using SYSRET, with
      %rcx/%r11 on the stack. To convert back to "normal" looking exceptions,
      the xen thunks do 'xen_*: pop %rcx; pop %r11; jmp *'.
      
      E.g. Extracting 'xen_pv_trap xenint3' we have:
      xen_xenint3:
       pop %rcx;
       pop %r11;
       jmp xenint3
      
      As xenint3 and int3 entry code are same except xenint3 doesn't generate
      a gap, we can fix it by using int3 and drop useless xenint3.
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      11cb9f87
    • A
      x86: math-emu: Hide clang warnings for 16-bit overflow · 1b84e674
      Arnd Bergmann 提交于
      [ Upstream commit 29e7e9664aec17b94a9c8c5a75f8d216a206aa3a ]
      
      clang warns about a few parts of the math-emu implementation
      where a 16-bit integer becomes negative during assignment:
      
      arch/x86/math-emu/poly_tan.c:88:35: error: implicit conversion from 'int' to 'short' changes value from 49216 to -16320 [-Werror,-Wconstant-conversion]
                                            (0x41 + EXTENDED_Ebias) | SIGN_Negative);
                                            ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
      arch/x86/math-emu/fpu_emu.h:180:58: note: expanded from macro 'setexponent16'
       #define setexponent16(x,y)  { (*(short *)&((x)->exp)) = (y); }
                                                            ~  ^
      arch/x86/math-emu/reg_constant.c:37:32: error: implicit conversion from 'int' to 'short' changes value from 49085 to -16451 [-Werror,-Wconstant-conversion]
      FPU_REG const CONST_PI2extra = MAKE_REG(NEG, -66,
                                     ^~~~~~~~~~~~~~~~~~
      arch/x86/math-emu/reg_constant.c:21:25: note: expanded from macro 'MAKE_REG'
                      ((EXTENDED_Ebias+(e)) | ((SIGN_##s != 0)*0x8000)) }
                       ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
      arch/x86/math-emu/reg_constant.c:48:28: error: implicit conversion from 'int' to 'short' changes value from 65535 to -1 [-Werror,-Wconstant-conversion]
      FPU_REG const CONST_QNaN = MAKE_REG(NEG, EXP_OVER, 0x00000000, 0xC0000000);
                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      arch/x86/math-emu/reg_constant.c:21:25: note: expanded from macro 'MAKE_REG'
                      ((EXTENDED_Ebias+(e)) | ((SIGN_##s != 0)*0x8000)) }
                       ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
      
      The code is correct as is, so add a typecast to shut up the warnings.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190712090816.350668-1-arnd@arndb.deSigned-off-by: NSasha Levin <sashal@kernel.org>
      1b84e674
    • Q
      x86/apic: Silence -Wtype-limits compiler warnings · 242666b2
      Qian Cai 提交于
      [ Upstream commit ec6335586953b0df32f83ef696002063090c7aef ]
      
      There are many compiler warnings like this,
      
      In file included from ./arch/x86/include/asm/smp.h:13,
                       from ./arch/x86/include/asm/mmzone_64.h:11,
                       from ./arch/x86/include/asm/mmzone.h:5,
                       from ./include/linux/mmzone.h:969,
                       from ./include/linux/gfp.h:6,
                       from ./include/linux/mm.h:10,
                       from arch/x86/kernel/apic/io_apic.c:34:
      arch/x86/kernel/apic/io_apic.c: In function 'check_timer':
      ./arch/x86/include/asm/apic.h:37:11: warning: comparison of unsigned
      expression >= 0 is always true [-Wtype-limits]
         if ((v) <= apic_verbosity) \
                 ^~
      arch/x86/kernel/apic/io_apic.c:2160:2: note: in expansion of macro
      'apic_printk'
        apic_printk(APIC_QUIET, KERN_INFO "..TIMER: vector=0x%02X "
        ^~~~~~~~~~~
      ./arch/x86/include/asm/apic.h:37:11: warning: comparison of unsigned
      expression >= 0 is always true [-Wtype-limits]
         if ((v) <= apic_verbosity) \
                 ^~
      arch/x86/kernel/apic/io_apic.c:2207:4: note: in expansion of macro
      'apic_printk'
          apic_printk(APIC_QUIET, KERN_ERR "..MP-BIOS bug: "
          ^~~~~~~~~~~
      
      APIC_QUIET is 0, so silence them by making apic_verbosity type int.
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/1562621805-24789-1-git-send-email-cai@lca.pwSigned-off-by: NSasha Levin <sashal@kernel.org>
      242666b2
    • A
      x86: kvm: avoid constant-conversion warning · 80f58147
      Arnd Bergmann 提交于
      [ Upstream commit a6a6d3b1f867d34ba5bd61aa7bb056b48ca67cff ]
      
      clang finds a contruct suspicious that converts an unsigned
      character to a signed integer and back, causing an overflow:
      
      arch/x86/kvm/mmu.c:4605:39: error: implicit conversion from 'int' to 'u8' (aka 'unsigned char') changes value from -205 to 51 [-Werror,-Wconstant-conversion]
                      u8 wf = (pfec & PFERR_WRITE_MASK) ? ~w : 0;
                         ~~                               ^~
      arch/x86/kvm/mmu.c:4607:38: error: implicit conversion from 'int' to 'u8' (aka 'unsigned char') changes value from -241 to 15 [-Werror,-Wconstant-conversion]
                      u8 uf = (pfec & PFERR_USER_MASK) ? ~u : 0;
                         ~~                              ^~
      arch/x86/kvm/mmu.c:4609:39: error: implicit conversion from 'int' to 'u8' (aka 'unsigned char') changes value from -171 to 85 [-Werror,-Wconstant-conversion]
                      u8 ff = (pfec & PFERR_FETCH_MASK) ? ~x : 0;
                         ~~                               ^~
      
      Add an explicit cast to tell clang that everything works as
      intended here.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Link: https://github.com/ClangBuiltLinux/linux/issues/95Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      80f58147
    • P
      MIPS: lantiq: Fix bitfield masking · a3524486
      Petr Cvek 提交于
      [ Upstream commit ba1bc0fcdeaf3bf583c1517bd2e3e29cf223c969 ]
      
      The modification of EXIN register doesn't clean the bitfield before
      the writing of a new value. After a few modifications the bitfield would
      accumulate only '1's.
      Signed-off-by: NPetr Cvek <petrcvekcz@gmail.com>
      Signed-off-by: NPaul Burton <paul.burton@mips.com>
      Cc: hauke@hauke-m.de
      Cc: john@phrozen.org
      Cc: linux-mips@vger.kernel.org
      Cc: openwrt-devel@lists.openwrt.org
      Cc: pakahmar@hotmail.com
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a3524486
    • H
      arm64: dts: rockchip: fix isp iommu clocks and power domain · fd53e45a
      Helen Koike 提交于
      [ Upstream commit c432a29d3fc9ee928caeca2f5cf68b3aebfa6817 ]
      
      isp iommu requires wrapper variants of the clocks.
      noc variants are always on and using the wrapper variants will activate
      {A,H}CLK_ISP{0,1} due to the hierarchy.
      
      Tested using the pending isp patch set (which is not upstream
      yet). Without this patch, streaming from the isp stalls.
      
      Also add the respective power domain and remove the "disabled" status.
      
      Refer:
       RK3399 TRM v1.4 Fig. 2-4 RK3399 Clock Architecture Diagram
       RK3399 TRM v1.4 Fig. 8-1 RK3399 Power Domain Partition
      Signed-off-by: NHelen Koike <helen.koike@collabora.com>
      Tested-by: NManivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Signed-off-by: NHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      fd53e45a
    • D
      ARM: dts: rockchip: Mark that the rk3288 timer might stop in suspend · ea26b427
      Douglas Anderson 提交于
      [ Upstream commit 8ef1ba39a9fa53d2205e633bc9b21840a275908e ]
      
      This is similar to commit e6186820 ("arm64: dts: rockchip: Arch
      counter doesn't tick in system suspend").  Specifically on the rk3288
      it can be seen that the timer stops ticking in suspend if we end up
      running through the "osc_disable" path in rk3288_slp_mode_set().  In
      that path the 24 MHz clock will turn off and the timer stops.
      
      To test this, I ran this on a Chrome OS filesystem:
        before=$(date); \
        suspend_stress_test -c1 --suspend_min=30 --suspend_max=31; \
        echo ${before}; date
      
      ...and I found that unless I plug in a device that requests USB wakeup
      to be active that the two calls to "date" would show that fewer than
      30 seconds passed.
      
      NOTE: deep suspend (where the 24 MHz clock gets disabled) isn't
      supported yet on upstream Linux so this was tested on a downstream
      kernel.
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Signed-off-by: NHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ea26b427
    • D
      ARM: dts: rockchip: Make rk3288-veyron-mickey's emmc work again · 22befe67
      Douglas Anderson 提交于
      [ Upstream commit 99fa066710f75f18f4d9a5bc5f6a711968a581d5 ]
      
      When I try to boot rk3288-veyron-mickey I totally fail to make the
      eMMC work.  Specifically my logs (on Chrome OS 4.19):
      
        mmc_host mmc1: card is non-removable.
        mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
        mmc_host mmc1: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz, actual 50000000HZ div = 0)
        mmc1: switch to bus width 8 failed
        mmc1: switch to bus width 4 failed
        mmc1: new high speed MMC card at address 0001
        mmcblk1: mmc1:0001 HAG2e 14.7 GiB
        mmcblk1boot0: mmc1:0001 HAG2e partition 1 4.00 MiB
        mmcblk1boot1: mmc1:0001 HAG2e partition 2 4.00 MiB
        mmcblk1rpmb: mmc1:0001 HAG2e partition 3 4.00 MiB, chardev (243:0)
        mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
        mmc_host mmc1: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz, actual 50000000HZ div = 0)
        mmc1: switch to bus width 8 failed
        mmc1: switch to bus width 4 failed
        mmc1: tried to HW reset card, got error -110
        mmcblk1: error -110 requesting status
        mmcblk1: recovery failed!
        print_req_error: I/O error, dev mmcblk1, sector 0
        ...
      
      When I remove the '/delete-property/mmc-hs200-1_8v' then everything is
      hunky dory.
      
      That line comes from the original submission of the mickey dts
      upstream, so presumably at the time the HS200 was failing and just
      enumerating things as a high speed device was fine.  ...or maybe it's
      just that some mickey devices work when enumerating at "high speed",
      just not mine?
      
      In any case, hs200 seems good now.  Let's turn it on.
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Signed-off-by: NHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      22befe67
    • D
      ARM: dts: rockchip: Make rk3288-veyron-minnie run at hs200 · 8c5a33d3
      Douglas Anderson 提交于
      [ Upstream commit 1c0479023412ab7834f2e98b796eb0d8c627cd62 ]
      
      As some point hs200 was failing on rk3288-veyron-minnie.  See commit
      98492678 ("ARM: dts: rockchip: temporarily remove emmc hs200 speed
      from rk3288 minnie").  Although I didn't track down exactly when it
      started working, it seems to work OK now, so let's turn it back on.
      
      To test this, I booted from SD card and then used this script to
      stress the enumeration process after fixing a memory leak [1]:
        cd /sys/bus/platform/drivers/dwmmc_rockchip
        for i in $(seq 1 3000); do
          echo "========================" $i
          echo ff0f0000.dwmmc > unbind
          sleep .5
          echo ff0f0000.dwmmc > bind
          while true; do
            if [ -e /dev/mmcblk2 ]; then
              break;
            fi
            sleep .1
          done
        done
      
      It worked fine.
      
      [1] https://lkml.kernel.org/r/20190503233526.226272-1-dianders@chromium.orgSigned-off-by: NDouglas Anderson <dianders@chromium.org>
      Signed-off-by: NHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8c5a33d3
    • R
      ARM: riscpc: fix DMA · 3c1d1bad
      Russell King 提交于
      [ Upstream commit ffd9a1ba9fdb7f2bd1d1ad9b9243d34e96756ba2 ]
      
      DMA got broken a while back in two different ways:
      1) a change in the behaviour of disable_irq() to wait for the interrupt
         to finish executing causes us to deadlock at the end of DMA.
      2) a change to avoid modifying the scatterlist left the first transfer
         uninitialised.
      
      DMA is only used with expansion cards, so has gone unnoticed.
      
      Fixes: fa4e9989 ("[ARM] dma: RiscPC: don't modify DMA SG entries")
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3c1d1bad
  2. 04 8月, 2019 2 次提交
  3. 31 7月, 2019 15 次提交
    • M
      powerpc/tm: Fix oops on sigreturn on systems without TM · b993a66d
      Michael Neuling 提交于
      commit f16d80b75a096c52354c6e0a574993f3b0dfbdfe upstream.
      
      On systems like P9 powernv where we have no TM (or P8 booted with
      ppc_tm=off), userspace can construct a signal context which still has
      the MSR TS bits set. The kernel tries to restore this context which
      results in the following crash:
      
        Unexpected TM Bad Thing exception at c0000000000022fc (msr 0x8000000102a03031) tm_scratch=800000020280f033
        Oops: Unrecoverable exception, sig: 6 [#1]
        LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in:
        CPU: 0 PID: 1636 Comm: sigfuz Not tainted 5.2.0-11043-g0a8ad0ffa4 #69
        NIP:  c0000000000022fc LR: 00007fffb2d67e48 CTR: 0000000000000000
        REGS: c00000003fffbd70 TRAP: 0700   Not tainted  (5.2.0-11045-g7142b497d8)
        MSR:  8000000102a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[E]>  CR: 42004242  XER: 00000000
        CFAR: c0000000000022e0 IRQMASK: 0
        GPR00: 0000000000000072 00007fffb2b6e560 00007fffb2d87f00 0000000000000669
        GPR04: 00007fffb2b6e728 0000000000000000 0000000000000000 00007fffb2b6f2a8
        GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR12: 0000000000000000 00007fffb2b76900 0000000000000000 0000000000000000
        GPR16: 00007fffb2370000 00007fffb2d84390 00007fffea3a15ac 000001000a250420
        GPR20: 00007fffb2b6f260 0000000010001770 0000000000000000 0000000000000000
        GPR24: 00007fffb2d843a0 00007fffea3a14a0 0000000000010000 0000000000800000
        GPR28: 00007fffea3a14d8 00000000003d0f00 0000000000000000 00007fffb2b6e728
        NIP [c0000000000022fc] rfi_flush_fallback+0x7c/0x80
        LR [00007fffb2d67e48] 0x7fffb2d67e48
        Call Trace:
        Instruction dump:
        e96a0220 e96a02a8 e96a0330 e96a03b8 394a0400 4200ffdc 7d2903a6 e92d0c00
        e94d0c08 e96d0c10 e82d0c18 7db242a6 <4c000024> 7db243a6 7db142a6 f82d0c18
      
      The problem is the signal code assumes TM is enabled when
      CONFIG_PPC_TRANSACTIONAL_MEM is enabled. This may not be the case as
      with P9 powernv or if `ppc_tm=off` is used on P8.
      
      This means any local user can crash the system.
      
      Fix the problem by returning a bad stack frame to the user if they try
      to set the MSR TS bits with sigreturn() on systems where TM is not
      supported.
      
      Found with sigfuz kernel selftest on P9.
      
      This fixes CVE-2019-13648.
      
      Fixes: 2b0a576d ("powerpc: Add new transactional memory state to the signal context")
      Cc: stable@vger.kernel.org # v3.9
      Reported-by: NPraveen Pandey <Praveen.Pandey@in.ibm.com>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190719050502.405-1-mikey@neuling.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b993a66d
    • G
      powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask() · b9310c56
      Gautham R. Shenoy 提交于
      commit 4d202c8c8ed3822327285747db1765967110b274 upstream.
      
      xive_find_target_in_mask() has the following for(;;) loop which has a
      bug when @first == cpumask_first(@mask) and condition 1 fails to hold
      for every CPU in @mask. In this case we loop forever in the for-loop.
      
        first = cpu;
        for (;;) {
        	  if (cpu_online(cpu) && xive_try_pick_target(cpu)) // condition 1
      		  return cpu;
      	  cpu = cpumask_next(cpu, mask);
      	  if (cpu == first) // condition 2
      		  break;
      
      	  if (cpu >= nr_cpu_ids) // condition 3
      		  cpu = cpumask_first(mask);
        }
      
      This is because, when @first == cpumask_first(@mask), we never hit the
      condition 2 (cpu == first) since prior to this check, we would have
      executed "cpu = cpumask_next(cpu, mask)" which will set the value of
      @cpu to a value greater than @first or to nr_cpus_ids. When this is
      coupled with the fact that condition 1 is not met, we will never exit
      this loop.
      
      This was discovered by the hard-lockup detector while running LTP test
      concurrently with SMT switch tests.
      
       watchdog: CPU 12 detected hard LOCKUP on other CPUs 68
       watchdog: CPU 12 TB:85587019220796, last SMP heartbeat TB:85578827223399 (15999ms ago)
       watchdog: CPU 68 Hard LOCKUP
       watchdog: CPU 68 TB:85587019361273, last heartbeat TB:85576815065016 (19930ms ago)
       CPU: 68 PID: 45050 Comm: hxediag Kdump: loaded Not tainted 4.18.0-100.el8.ppc64le #1
       NIP:  c0000000006f5578 LR: c000000000cba9ec CTR: 0000000000000000
       REGS: c000201fff3c7d80 TRAP: 0100   Not tainted  (4.18.0-100.el8.ppc64le)
       MSR:  9000000002883033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 24028424  XER: 00000000
       CFAR: c0000000006f558c IRQMASK: 1
       GPR00: c0000000000afc58 c000201c01c43400 c0000000015ce500 c000201cae26ec18
       GPR04: 0000000000000800 0000000000000540 0000000000000800 00000000000000f8
       GPR08: 0000000000000020 00000000000000a8 0000000080000000 c00800001a1beed8
       GPR12: c0000000000b1410 c000201fff7f4c00 0000000000000000 0000000000000000
       GPR16: 0000000000000000 0000000000000000 0000000000000540 0000000000000001
       GPR20: 0000000000000048 0000000010110000 c00800001a1e3780 c000201cae26ed18
       GPR24: 0000000000000000 c000201cae26ed8c 0000000000000001 c000000001116bc0
       GPR28: c000000001601ee8 c000000001602494 c000201cae26ec18 000000000000001f
       NIP [c0000000006f5578] find_next_bit+0x38/0x90
       LR [c000000000cba9ec] cpumask_next+0x2c/0x50
       Call Trace:
       [c000201c01c43400] [c000201cae26ec18] 0xc000201cae26ec18 (unreliable)
       [c000201c01c43420] [c0000000000afc58] xive_find_target_in_mask+0x1b8/0x240
       [c000201c01c43470] [c0000000000b0228] xive_pick_irq_target.isra.3+0x168/0x1f0
       [c000201c01c435c0] [c0000000000b1470] xive_irq_startup+0x60/0x260
       [c000201c01c43640] [c0000000001d8328] __irq_startup+0x58/0xf0
       [c000201c01c43670] [c0000000001d844c] irq_startup+0x8c/0x1a0
       [c000201c01c436b0] [c0000000001d57b0] __setup_irq+0x9f0/0xa90
       [c000201c01c43760] [c0000000001d5aa0] request_threaded_irq+0x140/0x220
       [c000201c01c437d0] [c00800001a17b3d4] bnx2x_nic_load+0x188c/0x3040 [bnx2x]
       [c000201c01c43950] [c00800001a187c44] bnx2x_self_test+0x1fc/0x1f70 [bnx2x]
       [c000201c01c43a90] [c000000000adc748] dev_ethtool+0x11d8/0x2cb0
       [c000201c01c43b60] [c000000000b0b61c] dev_ioctl+0x5ac/0xa50
       [c000201c01c43bf0] [c000000000a8d4ec] sock_do_ioctl+0xbc/0x1b0
       [c000201c01c43c60] [c000000000a8dfb8] sock_ioctl+0x258/0x4f0
       [c000201c01c43d20] [c0000000004c9704] do_vfs_ioctl+0xd4/0xa70
       [c000201c01c43de0] [c0000000004ca274] sys_ioctl+0xc4/0x160
       [c000201c01c43e30] [c00000000000b388] system_call+0x5c/0x70
       Instruction dump:
       78aad182 54a806be 3920ffff 78a50664 794a1f24 7d294036 7d43502a 7d295039
       4182001c 48000034 78a9d182 79291f24 <7d23482a> 2fa90000 409e0020 38a50040
      
      To fix this, move the check for condition 2 after the check for
      condition 3, so that we are able to break out of the loop soon after
      iterating through all the CPUs in the @mask in the problem case. Use
      do..while() to achieve this.
      
      Fixes: 243e2511 ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
      Cc: stable@vger.kernel.org # v4.12+
      Reported-by: NIndira P. Joga <indira.priya@in.ibm.com>
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1563359724-13931-1-git-send-email-ego@linux.vnet.ibm.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9310c56
    • Z
      x86/speculation/mds: Apply more accurate check on hypervisor platform · 7d20e3ba
      Zhenzhong Duan 提交于
      commit 517c3ba00916383af6411aec99442c307c23f684 upstream.
      
      X86_HYPER_NATIVE isn't accurate for checking if running on native platform,
      e.g. CONFIG_HYPERVISOR_GUEST isn't set or "nopv" is enabled.
      
      Checking the CPU feature bit X86_FEATURE_HYPERVISOR to determine if it's
      running on native platform is more accurate.
      
      This still doesn't cover the platforms on which X86_FEATURE_HYPERVISOR is
      unsupported, e.g. VMware, but there is nothing which can be done about this
      scenario.
      
      Fixes: 8a4b06d391b0 ("x86/speculation/mds: Add sysfs reporting for MDS")
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1564022349-17338-1-git-send-email-zhenzhong.duan@oracle.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7d20e3ba
    • H
      x86/sysfb_efi: Add quirks for some devices with swapped width and height · 5e87e8b4
      Hans de Goede 提交于
      commit d02f1aa39189e0619c3525d5cd03254e61bf606a upstream.
      
      Some Lenovo 2-in-1s with a detachable keyboard have a portrait screen but
      advertise a landscape resolution and pitch, resulting in a messed up
      display if the kernel tries to show anything on the efifb (because of the
      wrong pitch).
      
      Fix this by adding a new DMI match table for devices which need to have
      their width and height swapped.
      
      At first it was tried to use the existing table for overriding some of the
      efifb parameters, but some of the affected devices have variants with
      different LCD resolutions which will not work with hardcoded override
      values.
      
      Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1730783Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190721152418.11644-1-hdegoede@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e87e8b4
    • S
      sh: prevent warnings when using iounmap · 7bd5902a
      Sam Ravnborg 提交于
      [ Upstream commit 733f0025f0fb43e382b84db0930ae502099b7e62 ]
      
      When building drm/exynos for sh, as part of an allmodconfig build, the
      following warning triggered:
      
        exynos7_drm_decon.c: In function `decon_remove':
        exynos7_drm_decon.c:769:24: warning: unused variable `ctx'
          struct decon_context *ctx = dev_get_drvdata(&pdev->dev);
      
      The ctx variable is only used as argument to iounmap().
      
      In sh - allmodconfig CONFIG_MMU is not defined
      so it ended up in:
      
      \#define __iounmap(addr)	do { } while (0)
      \#define iounmap		__iounmap
      
      Fix the warning by introducing a static inline function for iounmap.
      
      This is similar to several other architectures.
      
      Link: http://lkml.kernel.org/r/20190622114208.24427-1-sam@ravnborg.orgSigned-off-by: NSam Ravnborg <sam@ravnborg.org>
      Reviewed-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Inki Dae <inki.dae@samsung.com>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      7bd5902a
    • O
      powerpc/eeh: Handle hugepages in ioremap space · 7f775a67
      Oliver O'Halloran 提交于
      [ Upstream commit 33439620680be5225c1b8806579a291e0d761ca0 ]
      
      In commit 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap
      space") support for using hugepages in the vmalloc and ioremap areas was
      enabled for radix. Unfortunately this broke EEH MMIO error checking.
      
      Detection works by inserting a hook which checks the results of the
      ioreadXX() set of functions.  When a read returns a 0xFFs response we
      need to check for an error which we do by mapping the (virtual) MMIO
      address back to a physical address, then mapping physical address to a
      PCI device via an interval tree.
      
      When translating virt -> phys we currently assume the ioremap space is
      only populated by PAGE_SIZE mappings. If a hugepage mapping is found we
      emit a WARN_ON(), but otherwise handles the check as though a normal
      page was found. In pathalogical cases such as copying a buffer
      containing a lot of 0xFFs from BAR memory this can result in the system
      not booting because it's too busy printing WARN_ON()s.
      
      There's no real reason to assume huge pages can't be present and we're
      prefectly capable of handling them, so do that.
      
      Fixes: 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap space")
      Reported-by: NSachin Sant <sachinp@linux.vnet.ibm.com>
      Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
      Tested-by: NSachin Sant <sachinp@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190710150517.27114-1-oohall@gmail.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      7f775a67
    • M
      powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h · 4b9dc73a
      Masahiro Yamada 提交于
      [ Upstream commit 9e005b761e7ad153dcf40a6cba1d681fe0830ac6 ]
      
      The next commit will make the way of passing CONFIG options more robust.
      Unfortunately, it would uncover another hidden issue; without this
      commit, skiroot_defconfig would be broken like this:
      
      |   WRAP    arch/powerpc/boot/zImage.pseries
      | arch/powerpc/boot/wrapper.a(decompress.o): In function `bcj_powerpc.isra.10':
      | decompress.c:(.text+0x720): undefined reference to `get_unaligned_be32'
      | decompress.c:(.text+0x7a8): undefined reference to `put_unaligned_be32'
      | make[1]: *** [arch/powerpc/boot/Makefile;383: arch/powerpc/boot/zImage.pseries] Error 1
      | make: *** [arch/powerpc/Makefile;295: zImage] Error 2
      
      skiroot_defconfig is the only defconfig that enables CONFIG_KERNEL_XZ
      for ppc, which has never been correctly built before.
      
      I figured out the root cause in lib/decompress_unxz.c:
      
      | #ifdef CONFIG_PPC
      | #      define XZ_DEC_POWERPC
      | #endif
      
      CONFIG_PPC is undefined here in the ppc bootwrapper because autoconf.h
      is not included except by arch/powerpc/boot/serial.c
      
      XZ_DEC_POWERPC is not defined, therefore, bcj_powerpc() is not compiled
      for the bootwrapper.
      
      With the next commit passing CONFIG_PPC correctly, we would realize that
      {get,put}_unaligned_be32 was missing.
      
      Unlike the other decompressors, the ppc bootwrapper duplicates all the
      necessary helpers in arch/powerpc/boot/.
      
      The other architectures define __KERNEL__ and pull in helpers for
      building the decompressors.
      
      If ppc bootwrapper had defined __KERNEL__, lib/xz/xz_private.h would
      have included <asm/unaligned.h>:
      
      | #ifdef __KERNEL__
      | #       include <linux/xz.h>
      | #       include <linux/kernel.h>
      | #       include <asm/unaligned.h>
      
      However, doing so would cause tons of definition conflicts since the
      bootwrapper has duplicated everything.
      
      I just added copies of {get,put}_unaligned_be32, following the
      bootwrapper coding convention.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190705100144.28785-1-yamada.masahiro@socionext.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      4b9dc73a
    • J
      arm64: assembler: Switch ESB-instruction with a vanilla nop if !ARM64_HAS_RAS · 05959ed8
      James Morse 提交于
      [ Upstream commit 2b68a2a963a157f024c67c0697b16f5f792c8a35 ]
      
      The ESB-instruction is a nop on CPUs that don't implement the RAS
      extensions. This lets us use it in places like the vectors without
      having to use alternatives.
      
      If someone disables CONFIG_ARM64_RAS_EXTN, this instruction still has
      its RAS extensions behaviour, but we no longer read DISR_EL1 as this
      register does depend on alternatives.
      
      This could go wrong if we want to synchronize an SError from a KVM
      guest. On a CPU that has the RAS extensions, but the KConfig option
      was disabled, we consume the pending SError with no chance of ever
      reading it.
      
      Hide the ESB-instruction behind the CONFIG_ARM64_RAS_EXTN option,
      outputting a regular nop if the feature has been disabled.
      Reported-by: NJulien Thierry <julien.thierry@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      05959ed8
    • A
      powerpc/mm: Handle page table allocation failures · d48720ba
      Aneesh Kumar K.V 提交于
      [ Upstream commit 2230ebf6e6dd0b7751e2921b40f6cfe34f09bb16 ]
      
      This fixes kernel crash that arises due to not handling page table allocation
      failures while allocating hugetlb page table.
      
      Fixes: e2b3d202 ("powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d48720ba
    • C
      powerpc/4xx/uic: clear pending interrupt after irq type/pol change · 52373ab6
      Christian Lamparter 提交于
      [ Upstream commit 3ab3a0689e74e6aa5b41360bc18861040ddef5b1 ]
      
      When testing out gpio-keys with a button, a spurious
      interrupt (and therefore a key press or release event)
      gets triggered as soon as the driver enables the irq
      line for the first time.
      
      This patch clears any potential bogus generated interrupt
      that was caused by the switching of the associated irq's
      type and polarity.
      Signed-off-by: NChristian Lamparter <chunkeey@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      52373ab6
    • J
      um: Silence lockdep complaint about mmap_sem · 74520144
      Johannes Berg 提交于
      [ Upstream commit 80bf6ceaf9310b3f61934c69b382d4912deee049 ]
      
      When we get into activate_mm(), lockdep complains that we're doing
      something strange:
      
          WARNING: possible circular locking dependency detected
          5.1.0-10252-gb00152307319-dirty #121 Not tainted
          ------------------------------------------------------
          inside.sh/366 is trying to acquire lock:
          (____ptrval____) (&(&p->alloc_lock)->rlock){+.+.}, at: flush_old_exec+0x703/0x8d7
      
          but task is already holding lock:
          (____ptrval____) (&mm->mmap_sem){++++}, at: flush_old_exec+0x6c5/0x8d7
      
          which lock already depends on the new lock.
      
          the existing dependency chain (in reverse order) is:
      
          -> #1 (&mm->mmap_sem){++++}:
                 [...]
                 __lock_acquire+0x12ab/0x139f
                 lock_acquire+0x155/0x18e
                 down_write+0x3f/0x98
                 flush_old_exec+0x748/0x8d7
                 load_elf_binary+0x2ca/0xddb
                 [...]
      
          -> #0 (&(&p->alloc_lock)->rlock){+.+.}:
                 [...]
                 __lock_acquire+0x12ab/0x139f
                 lock_acquire+0x155/0x18e
                 _raw_spin_lock+0x30/0x83
                 flush_old_exec+0x703/0x8d7
                 load_elf_binary+0x2ca/0xddb
                 [...]
      
          other info that might help us debug this:
      
           Possible unsafe locking scenario:
      
                 CPU0                    CPU1
                 ----                    ----
            lock(&mm->mmap_sem);
                                         lock(&(&p->alloc_lock)->rlock);
                                         lock(&mm->mmap_sem);
            lock(&(&p->alloc_lock)->rlock);
      
           *** DEADLOCK ***
      
          2 locks held by inside.sh/366:
           #0: (____ptrval____) (&sig->cred_guard_mutex){+.+.}, at: __do_execve_file+0x12d/0x869
           #1: (____ptrval____) (&mm->mmap_sem){++++}, at: flush_old_exec+0x6c5/0x8d7
      
          stack backtrace:
          CPU: 0 PID: 366 Comm: inside.sh Not tainted 5.1.0-10252-gb00152307319-dirty #121
          Stack:
           [...]
          Call Trace:
           [<600420de>] show_stack+0x13b/0x155
           [<6048906b>] dump_stack+0x2a/0x2c
           [<6009ae64>] print_circular_bug+0x332/0x343
           [<6009c5c6>] check_prev_add+0x669/0xdad
           [<600a06b4>] __lock_acquire+0x12ab/0x139f
           [<6009f3d0>] lock_acquire+0x155/0x18e
           [<604a07e0>] _raw_spin_lock+0x30/0x83
           [<60151e6a>] flush_old_exec+0x703/0x8d7
           [<601a8eb8>] load_elf_binary+0x2ca/0xddb
           [...]
      
      I think it's because in exec_mmap() we have
      
      	down_read(&old_mm->mmap_sem);
      ...
              task_lock(tsk);
      ...
      	activate_mm(active_mm, mm);
      	(which does down_write(&mm->mmap_sem))
      
      I'm not really sure why lockdep throws in the whole knowledge
      about the task lock, but it seems that old_mm and mm shouldn't
      ever be the same (and it doesn't deadlock) so tell lockdep that
      they're different.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      74520144
    • N
      powerpc/xmon: Fix disabling tracing while in xmon · 9fac3948
      Naveen N. Rao 提交于
      [ Upstream commit aaf06665f7ea3ee9f9754e16c1a507a89f1de5b1 ]
      
      Commit ed49f7fd ("powerpc/xmon: Disable tracing when entering
      xmon") added code to disable recording trace entries while in xmon. The
      commit introduced a variable 'tracing_enabled' to record if tracing was
      enabled on xmon entry, and used this to conditionally enable tracing
      during exit from xmon.
      
      However, we are not checking the value of 'fromipi' variable in
      xmon_core() when setting 'tracing_enabled'. Due to this, when secondary
      cpus enter xmon, they will see tracing as being disabled already and
      tracing won't be re-enabled on exit. Fix the same.
      
      Fixes: ed49f7fd ("powerpc/xmon: Disable tracing when entering xmon")
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      9fac3948
    • Q
      powerpc/cacheflush: fix variable set but not used · a80f67d5
      Qian Cai 提交于
      [ Upstream commit 04db3ede40ae4fc23a5c4237254c4a53bbe4c1f2 ]
      
      The powerpc's flush_cache_vmap() is defined as a macro and never use
      both of its arguments, so it will generate a compilation warning,
      
      lib/ioremap.c: In function 'ioremap_page_range':
      lib/ioremap.c:203:16: warning: variable 'start' set but not used
      [-Wunused-but-set-variable]
      
      Fix it by making it an inline function.
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a80f67d5
    • A
      powerpc/pci/of: Fix OF flags parsing for 64bit BARs · 216462fa
      Alexey Kardashevskiy 提交于
      [ Upstream commit df5be5be8735ef2ae80d5ae1f2453cd81a035c4b ]
      
      When the firmware does PCI BAR resource allocation, it passes the assigned
      addresses and flags (prefetch/64bit/...) via the "reg" property of
      a PCI device device tree node so the kernel does not need to do
      resource allocation.
      
      The flags are stored in resource::flags - the lower byte stores
      PCI_BASE_ADDRESS_SPACE/etc bits and the other bytes are IORESOURCE_IO/etc.
      Some flags from PCI_BASE_ADDRESS_xxx and IORESOURCE_xxx are duplicated,
      such as PCI_BASE_ADDRESS_MEM_PREFETCH/PCI_BASE_ADDRESS_MEM_TYPE_64/etc.
      When parsing the "reg" property, we copy the prefetch flag but we skip
      on PCI_BASE_ADDRESS_MEM_TYPE_64 which leaves the flags out of sync.
      
      The missing IORESOURCE_MEM_64 flag comes into play under 2 conditions:
      1. we remove PCI_PROBE_ONLY for pseries (by hacking pSeries_setup_arch()
      or by passing "/chosen/linux,pci-probe-only");
      2. we request resource alignment (by passing pci=resource_alignment=
      via the kernel cmd line to request PAGE_SIZE alignment or defining
      ppc_md.pcibios_default_alignment which returns anything but 0). Note that
      the alignment requests are ignored if PCI_PROBE_ONLY is enabled.
      
      With 1) and 2), the generic PCI code in the kernel unconditionally
      decides to:
      - reassign the BARs in pci_specified_resource_alignment() (works fine)
      - write new BARs to the device - this fails for 64bit BARs as the generic
      code looks at IORESOURCE_MEM_64 (not set) and writes only lower 32bits
      of the BAR and leaves the upper 32bit unmodified which breaks BAR mapping
      in the hypervisor.
      
      This fixes the issue by copying the flag. This is useful if we want to
      enforce certain BAR alignment per platform as handling subpage sized BARs
      is proven to cause problems with hotplug (SLOF already aligns BARs to 64k).
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NSam Bobroff <sbobroff@linux.ibm.com>
      Reviewed-by: NOliver O'Halloran <oohall@gmail.com>
      Reviewed-by: NShawn Anastasio <shawn@anastas.io>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      216462fa
    • N
      powerpc/pseries/mobility: prevent cpu hotplug during DT update · fd0d171c
      Nathan Lynch 提交于
      [ Upstream commit e59a175faa8df9d674247946f2a5a9c29c835725 ]
      
      CPU online/offline code paths are sensitive to parts of the device
      tree (various cpu node properties, cache nodes) that can be changed as
      a result of a migration.
      
      Prevent CPU hotplug while the device tree potentially is inconsistent.
      
      Fixes: 410bccf9 ("powerpc/pseries: Partition migration in the kernel")
      Signed-off-by: NNathan Lynch <nathanl@linux.ibm.com>
      Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      fd0d171c
  4. 28 7月, 2019 3 次提交
  5. 26 7月, 2019 10 次提交
    • N
      powerpc/pseries: Fix oops in hotplug memory notifier · 01982f7b
      Nathan Lynch 提交于
      commit 0aa82c482ab2ece530a6f44897b63b274bb43c8e upstream.
      
      During post-migration device tree updates, we can oops in
      pseries_update_drconf_memory() if the source device tree has an
      ibm,dynamic-memory-v2 property and the destination has a
      ibm,dynamic_memory (v1) property. The notifier processes an "update"
      for the ibm,dynamic-memory property but it's really an add in this
      scenario. So make sure the old property object is there before
      dereferencing it.
      
      Fixes: 2b31e3ae ("powerpc/drmem: Add support for ibm, dynamic-memory-v2 property")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NNathan Lynch <nathanl@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01982f7b
    • G
      powerpc/powernv/npu: Fix reference leak · e725502b
      Greg Kurz 提交于
      commit 02c5f5394918b9b47ff4357b1b18335768cd867d upstream.
      
      Since 902bdc57, get_pci_dev() calls pci_get_domain_bus_and_slot(). This
      has the effect of incrementing the reference count of the PCI device, as
      explained in drivers/pci/search.c:
      
       * Given a PCI domain, bus, and slot/function number, the desired PCI
       * device is located in the list of PCI devices. If the device is
       * found, its reference count is increased and this function returns a
       * pointer to its data structure.  The caller must decrement the
       * reference count by calling pci_dev_put().  If no device is found,
       * %NULL is returned.
      
      Nothing was done to call pci_dev_put() and the reference count of GPU and
      NPU PCI devices rockets up.
      
      A natural way to fix this would be to teach the callers about the change,
      so that they call pci_dev_put() when done with the pointer. This turns
      out to be quite intrusive, as it affects many paths in npu-dma.c,
      pci-ioda.c and vfio_pci_nvlink2.c. Also, the issue appeared in 4.16 and
      some affected code got moved around since then: it would be problematic
      to backport the fix to stable releases.
      
      All that code never cared for reference counting anyway. Call pci_dev_put()
      from get_pci_dev() to revert to the previous behavior.
      
      Fixes: 902bdc57 ("powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn")
      Cc: stable@vger.kernel.org # v4.16
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e725502b
    • R
      powerpc/watchpoint: Restore NV GPRs while returning from exception · 1e3b61cb
      Ravi Bangoria 提交于
      commit f474c28fbcbe42faca4eb415172c07d76adcb819 upstream.
      
      powerpc hardware triggers watchpoint before executing the instruction.
      To make trigger-after-execute behavior, kernel emulates the
      instruction. If the instruction is 'load something into non-volatile
      register', exception handler should restore emulated register state
      while returning back, otherwise there will be register state
      corruption. eg, adding a watchpoint on a list can corrput the list:
      
        # cat /proc/kallsyms | grep kthread_create_list
        c00000000121c8b8 d kthread_create_list
      
      Add watchpoint on kthread_create_list->prev:
      
        # perf record -e mem:0xc00000000121c8c0
      
      Run some workload such that new kthread gets invoked. eg, I just
      logged out from console:
      
        list_add corruption. next->prev should be prev (c000000001214e00), \
      	but was c00000000121c8b8. (next=c00000000121c8b8).
        WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0
        CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69
        ...
        NIP __list_add_valid+0xb4/0xc0
        LR __list_add_valid+0xb0/0xc0
        Call Trace:
        __list_add_valid+0xb0/0xc0 (unreliable)
        __kthread_create_on_node+0xe0/0x260
        kthread_create_on_node+0x34/0x50
        create_worker+0xe8/0x260
        worker_thread+0x444/0x560
        kthread+0x160/0x1a0
        ret_from_kernel_thread+0x5c/0x70
      
      List corruption happened because it uses 'load into non-volatile
      register' instruction:
      
      Snippet from __kthread_create_on_node:
      
        c000000000136be8:     addis   r29,r2,-19
        c000000000136bec:     ld      r29,31424(r29)
              if (!__list_add_valid(new, prev, next))
        c000000000136bf0:     mr      r3,r30
        c000000000136bf4:     mr      r5,r28
        c000000000136bf8:     mr      r4,r29
        c000000000136bfc:     bl      c00000000059a2f8 <__list_add_valid+0x8>
      
      Register state from WARN_ON():
      
        GPR00: c00000000059a3a0 c000007ff23afb50 c000000001344e00 0000000000000075
        GPR04: 0000000000000000 0000000000000000 0000001852af8bc1 0000000000000000
        GPR08: 0000000000000001 0000000000000007 0000000000000006 00000000000004aa
        GPR12: 0000000000000000 c000007ffffeb080 c000000000137038 c000005ff62aaa00
        GPR16: 0000000000000000 0000000000000000 c000007fffbe7600 c000007fffbe7370
        GPR20: c000007fffbe7320 c000007fffbe7300 c000000001373a00 0000000000000000
        GPR24: fffffffffffffef7 c00000000012e320 c000007ff23afcb0 c000000000cb8628
        GPR28: c00000000121c8b8 c000000001214e00 c000007fef5b17e8 c000007fef5b17c0
      
      Watchpoint hit at 0xc000000000136bec.
      
        addis   r29,r2,-19
         => r29 = 0xc000000001344e00 + (-19 << 16)
         => r29 = 0xc000000001214e00
      
        ld      r29,31424(r29)
         => r29 = *(0xc000000001214e00 + 31424)
         => r29 = *(0xc00000000121c8c0)
      
      0xc00000000121c8c0 is where we placed a watchpoint and thus this
      instruction was emulated by emulate_step. But because handle_dabr_fault
      did not restore emulated register state, r29 still contains stale
      value in above register state.
      
      Fixes: 5aae8a53 ("powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors")
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: stable@vger.kernel.org # 2.6.36+
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e3b61cb
    • C
      powerpc/32s: fix suspend/resume when IBATs 4-7 are used · 237ac0d7
      Christophe Leroy 提交于
      commit 6ecb78ef56e08d2119d337ae23cb951a640dc52d upstream.
      
      Previously, only IBAT1 and IBAT2 were used to map kernel linear mem.
      Since commit 63b2bc619565 ("powerpc/mm/32s: Use BATs for
      STRICT_KERNEL_RWX"), we may have all 8 BATs used for mapping
      kernel text. But the suspend/restore functions only save/restore
      BATs 0 to 3, and clears BATs 4 to 7.
      
      Make suspend and restore functions respectively save and reload
      the 8 BATs on CPUs having MMU_FTR_USE_HIGH_BATS feature.
      Reported-by: NAndreas Schwab <schwab@linux-m68k.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      237ac0d7
    • H
      parisc: Fix kernel panic due invalid values in IAOQ0 or IAOQ1 · 79619817
      Helge Deller 提交于
      commit 10835c854685393a921b68f529bf740fa7c9984d upstream.
      
      On parisc the privilege level of a process is stored in the lowest two bits of
      the instruction pointers (IAOQ0 and IAOQ1). On Linux we use privilege level 0
      for the kernel and privilege level 3 for user-space. So userspace should not be
      allowed to modify IAOQ0 or IAOQ1 of a ptraced process to change it's privilege
      level to e.g. 0 to try to gain kernel privileges.
      
      This patch prevents such modifications by always setting the two lowest bits to
      one (which relates to privilege level 3 for user-space) if IAOQ0 or IAOQ1 are
      modified via ptrace calls in the native and compat ptrace paths.
      
      Link: https://bugs.gentoo.org/481768Reported-by: NJeroen Roovers <jer@gentoo.org>
      Cc: <stable@vger.kernel.org>
      Tested-by: NRolf Eike Beer <eike-kernel@sf-tec.de>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79619817
    • H
      parisc: Ensure userspace privilege for ptraced processes in regset functions · a6a0daa7
      Helge Deller 提交于
      commit 34c32fc603311a72cb558e5e337555434f64c27b upstream.
      
      On parisc the privilege level of a process is stored in the lowest two bits of
      the instruction pointers (IAOQ0 and IAOQ1). On Linux we use privilege level 0
      for the kernel and privilege level 3 for user-space. So userspace should not be
      allowed to modify IAOQ0 or IAOQ1 of a ptraced process to change it's privilege
      level to e.g. 0 to try to gain kernel privileges.
      
      This patch prevents such modifications in the regset support functions by
      always setting the two lowest bits to one (which relates to privilege level 3
      for user-space) if IAOQ0 or IAOQ1 are modified via ptrace regset calls.
      
      Link: https://bugs.gentoo.org/481768
      Cc: <stable@vger.kernel.org> # v4.7+
      Tested-by: NRolf Eike Beer <eike-kernel@sf-tec.de>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a6a0daa7
    • K
      perf/x86/amd/uncore: Set the thread mask for F17h L3 PMCs · 9854e068
      Kim Phillips 提交于
      commit 2f217d58a8a086d3399fecce39fb358848e799c4 upstream.
      
      Fill in the L3 performance event select register ThreadMask
      bitfield, to enable per hardware thread accounting.
      Signed-off-by: NKim Phillips <kim.phillips@amd.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Gary Hook <Gary.Hook@amd.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Liska <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190628215906.4276-2-kim.phillips@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9854e068
    • K
      perf/x86/amd/uncore: Do not set 'ThreadMask' and 'SliceMask' for non-L3 PMCs · 82c46f7b
      Kim Phillips 提交于
      commit 16f4641166b10e199f0d7b68c2c5f004fef0bda3 upstream.
      
      The following commit:
      
        d7cbbe49 ("perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events")
      
      enables L3 PMC events for all threads and slices by writing 1's in
      'ChL3PmcCfg' (L3 PMC PERF_CTL) register fields.
      
      Those bitfields overlap with high order event select bits in the Data
      Fabric PMC control register, however.
      
      So when a user requests raw Data Fabric events (-e amd_df/event=0xYYY/),
      the two highest order bits get inadvertently set, changing the counter
      select to events that don't exist, and for which no counts are read.
      
      This patch changes the logic to write the L3 masks only when dealing
      with L3 PMC counters.
      
      AMD Family 16h and below Northbridge (NB) counters were not affected.
      Signed-off-by: NKim Phillips <kim.phillips@amd.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Gary Hook <Gary.Hook@amd.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Liska <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: d7cbbe49 ("perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events")
      Link: https://lkml.kernel.org/r/20190628215906.4276-1-kim.phillips@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      82c46f7b
    • K
      perf/x86/intel: Fix spurious NMI on fixed counter · a847a522
      Kan Liang 提交于
      commit e4557c1a46b0d32746bd309e1941914b5a6912b4 upstream.
      
      If a user first sample a PEBS event on a fixed counter, then sample a
      non-PEBS event on the same fixed counter on Icelake, it will trigger
      spurious NMI. For example:
      
        perf record -e 'cycles:p' -a
        perf record -e 'cycles' -a
      
      The error message for spurious NMI:
      
        [June 21 15:38] Uhhuh. NMI received for unknown reason 30 on CPU 2.
        [    +0.000000] Do you have a strange power saving mode enabled?
        [    +0.000000] Dazed and confused, but trying to continue
      
      The bug was introduced by the following commit:
      
        commit 6f55967ad9d9 ("perf/x86/intel: Fix race in intel_pmu_disable_event()")
      
      The commit moves the intel_pmu_pebs_disable() after intel_pmu_disable_fixed(),
      which returns immediately.  The related bit of PEBS_ENABLE MSR will never be
      cleared for the fixed counter. Then a non-PEBS event runs on the fixed counter,
      but the bit on PEBS_ENABLE is still set, which triggers spurious NMIs.
      
      Check and disable PEBS for fixed counters after intel_pmu_disable_fixed().
      Reported-by: NYi, Ammy <ammy.yi@intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 6f55967ad9d9 ("perf/x86/intel: Fix race in intel_pmu_disable_event()")
      Link: https://lkml.kernel.org/r/20190625142135.22112-1-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a847a522
    • D
      x86/boot: Fix memory leak in default_get_smp_config() · 0d4c0bb7
      David Rientjes 提交于
      commit e74bd96989dd42a51a73eddb4a5510a6f5e42ac3 upstream.
      
      When default_get_smp_config() is called with early == 1 and mpf->feature1
      is non-zero, mpf is leaked because the return path does not do
      early_memunmap().
      
      Fix this and share a common exit routine.
      
      Fixes: 5997efb9 ("x86/boot: Use memremap() to map the MPF and MPC data")
      Reported-by: NCfir Cohen <cfir@google.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907091942570.28240@chino.kir.corp.google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d4c0bb7