1. 12 9月, 2022 1 次提交
    • K
      kernel: exit: cleanup release_thread() · 2be9880d
      Kefeng Wang 提交于
      Only x86 has own release_thread(), introduce a new weak release_thread()
      function to clean empty definitions in other ARCHs.
      
      Link: https://lkml.kernel.org/r/20220819014406.32266-1-wangkefeng.wang@huawei.comSigned-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Acked-by: Guo Ren <guoren@kernel.org>				[csky]
      Acked-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NBrian Cain <bcain@quicinc.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>			[powerpc]
      Acked-by: Stafford Horne <shorne@gmail.com>			[openrisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>		[arm64]
      Acked-by: Huacai Chen <chenhuacai@kernel.org>			[LoongArch]
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dinh Nguyen <dinguyen@kernel.org>
      Cc: Guo Ren <guoren@kernel.org> [csky]
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Henderson <richard.henderson@linaro.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xuerui Wang <kernel@xen0n.name>
      Cc: Yoshinori Sato <ysato@users.osdn.me>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      2be9880d
  2. 05 5月, 2022 1 次提交
  3. 08 3月, 2022 1 次提交
  4. 09 12月, 2021 2 次提交
  5. 02 12月, 2021 1 次提交
  6. 15 10月, 2021 1 次提交
  7. 16 6月, 2021 2 次提交
    • S
      powerpc/64: drop redundant defination of spin_until_cond · db8f7066
      Sudeep Holla 提交于
      linux/processor.h has exactly same defination for spin_until_cond.
      Drop the redundant defination in asm/processor.h
      Signed-off-by: NSudeep Holla <sudeep.holla@arm.com>
      Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1fff2054e5dfc00329804dbd3f2a91667c9a8aff.1623438544.git.christophe.leroy@csgroup.eu
      db8f7066
    • C
      powerpc/32s: Rework Kernel Userspace Access Protection · 16132529
      Christophe Leroy 提交于
      On book3s/32, KUAP is provided by toggling Ks bit in segment registers.
      One segment register addresses 256M of virtual memory.
      
      At the time being, KUAP implements a complex logic to apply the
      unlock/lock on the exact number of segments covering the user range
      to access, with saving the boundaries of the range of segments in
      a member of thread struct.
      
      But most if not all user accesses are within a single segment.
      
      Rework KUAP with a different approach:
      - Open only one segment, the one corresponding to the starting
      address of the range to be accessed.
      - If a second segment is involved, it will generate a page fault. The
      segment will then be open by the page fault handler.
      
      The kuap member of thread struct will now contain:
      - The start address of the current on going user access, that will be
      used to know which segment to lock at the end of the user access.
      - ~0 when no user access is open
      - ~1 when additionnal segments are opened by a page fault.
      
      Then, at lock time
      - When only one segment is open, close it.
      - When several segments are open, close all user segments.
      
      Almost 100% of the time, only one segment will be involved.
      
      In interrupts, inline the function that unlock/lock all segments,
      because not inlining them implies a lot of register save/restore.
      
      With the patch, writing value 128 in userspace in perf_copy_attr() is
      done with 16 instructions:
      
          3890:	93 82 04 dc 	stw     r28,1244(r2)
          3894:	7d 20 e5 26 	mfsrin  r9,r28
          3898:	55 29 00 80 	rlwinm  r9,r9,0,2,0
          389c:	7d 20 e1 e4 	mtsrin  r9,r28
          38a0:	4c 00 01 2c 	isync
      
          38a4:	39 20 00 80 	li      r9,128
          38a8:	91 3c 00 00 	stw     r9,0(r28)
      
          38ac:	81 42 04 dc 	lwz     r10,1244(r2)
          38b0:	39 00 ff ff 	li      r8,-1
          38b4:	91 02 04 dc 	stw     r8,1244(r2)
          38b8:	2c 0a ff fe 	cmpwi   r10,-2
          38bc:	41 82 00 88 	beq     3944 <perf_copy_attr+0x36c>
          38c0:	7d 20 55 26 	mfsrin  r9,r10
          38c4:	65 29 40 00 	oris    r9,r9,16384
          38c8:	7d 20 51 e4 	mtsrin  r9,r10
          38cc:	4c 00 01 2c 	isync
      ...
          3944:	48 00 00 01 	bl      3944 <perf_copy_attr+0x36c>
      			3944: R_PPC_REL24	kuap_lock_all_ool
      
      Before the patch it was 118 instructions. In reality only 42 are
      executed in most cases, but GCC is not able to see that a properly
      aligned user access cannot involve more than one segment.
      
          5060:	39 1d 00 04 	addi    r8,r29,4
          5064:	3d 20 b0 00 	lis     r9,-20480
          5068:	7c 08 48 40 	cmplw   r8,r9
          506c:	40 81 00 08 	ble     5074 <perf_copy_attr+0x2cc>
          5070:	3d 00 b0 00 	lis     r8,-20480
          5074:	39 28 ff ff 	addi    r9,r8,-1
          5078:	57 aa 00 06 	rlwinm  r10,r29,0,0,3
          507c:	55 29 27 3e 	rlwinm  r9,r9,4,28,31
          5080:	39 29 00 01 	addi    r9,r9,1
          5084:	7d 29 53 78 	or      r9,r9,r10
          5088:	91 22 04 dc 	stw     r9,1244(r2)
          508c:	7d 20 ed 26 	mfsrin  r9,r29
          5090:	55 29 00 80 	rlwinm  r9,r9,0,2,0
          5094:	7c 08 50 40 	cmplw   r8,r10
          5098:	40 81 00 c0 	ble     5158 <perf_copy_attr+0x3b0>
          509c:	7d 46 50 f8 	not     r6,r10
          50a0:	7c c6 42 14 	add     r6,r6,r8
          50a4:	54 c6 27 be 	rlwinm  r6,r6,4,30,31
          50a8:	7d 20 51 e4 	mtsrin  r9,r10
          50ac:	3c ea 10 00 	addis   r7,r10,4096
          50b0:	39 29 01 11 	addi    r9,r9,273
          50b4:	7f 88 38 40 	cmplw   cr7,r8,r7
          50b8:	55 29 02 06 	rlwinm  r9,r9,0,8,3
          50bc:	40 9d 00 9c 	ble     cr7,5158 <perf_copy_attr+0x3b0>
      
          50c0:	2f 86 00 00 	cmpwi   cr7,r6,0
          50c4:	41 9e 00 4c 	beq     cr7,5110 <perf_copy_attr+0x368>
          50c8:	2f 86 00 01 	cmpwi   cr7,r6,1
          50cc:	41 9e 00 2c 	beq     cr7,50f8 <perf_copy_attr+0x350>
          50d0:	2f 86 00 02 	cmpwi   cr7,r6,2
          50d4:	41 9e 00 14 	beq     cr7,50e8 <perf_copy_attr+0x340>
          50d8:	7d 20 39 e4 	mtsrin  r9,r7
          50dc:	39 29 01 11 	addi    r9,r9,273
          50e0:	3c e7 10 00 	addis   r7,r7,4096
          50e4:	55 29 02 06 	rlwinm  r9,r9,0,8,3
          50e8:	7d 20 39 e4 	mtsrin  r9,r7
          50ec:	39 29 01 11 	addi    r9,r9,273
          50f0:	3c e7 10 00 	addis   r7,r7,4096
          50f4:	55 29 02 06 	rlwinm  r9,r9,0,8,3
          50f8:	7d 20 39 e4 	mtsrin  r9,r7
          50fc:	3c e7 10 00 	addis   r7,r7,4096
          5100:	39 29 01 11 	addi    r9,r9,273
          5104:	7f 88 38 40 	cmplw   cr7,r8,r7
          5108:	55 29 02 06 	rlwinm  r9,r9,0,8,3
          510c:	40 9d 00 4c 	ble     cr7,5158 <perf_copy_attr+0x3b0>
          5110:	7d 20 39 e4 	mtsrin  r9,r7
          5114:	39 29 01 11 	addi    r9,r9,273
          5118:	3c c7 10 00 	addis   r6,r7,4096
          511c:	55 29 02 06 	rlwinm  r9,r9,0,8,3
          5120:	7d 20 31 e4 	mtsrin  r9,r6
          5124:	39 29 01 11 	addi    r9,r9,273
          5128:	3c c6 10 00 	addis   r6,r6,4096
          512c:	55 29 02 06 	rlwinm  r9,r9,0,8,3
          5130:	7d 20 31 e4 	mtsrin  r9,r6
          5134:	39 29 01 11 	addi    r9,r9,273
          5138:	3c c7 30 00 	addis   r6,r7,12288
          513c:	55 29 02 06 	rlwinm  r9,r9,0,8,3
          5140:	7d 20 31 e4 	mtsrin  r9,r6
          5144:	3c e7 40 00 	addis   r7,r7,16384
          5148:	39 29 01 11 	addi    r9,r9,273
          514c:	7f 88 38 40 	cmplw   cr7,r8,r7
          5150:	55 29 02 06 	rlwinm  r9,r9,0,8,3
          5154:	41 9d ff bc 	bgt     cr7,5110 <perf_copy_attr+0x368>
      
          5158:	4c 00 01 2c 	isync
          515c:	39 20 00 80 	li      r9,128
          5160:	91 3d 00 00 	stw     r9,0(r29)
      
          5164:	38 e0 00 00 	li      r7,0
          5168:	90 e2 04 dc 	stw     r7,1244(r2)
          516c:	7d 20 ed 26 	mfsrin  r9,r29
          5170:	65 29 40 00 	oris    r9,r9,16384
          5174:	40 81 00 c0 	ble     5234 <perf_copy_attr+0x48c>
          5178:	7d 47 50 f8 	not     r7,r10
          517c:	7c e7 42 14 	add     r7,r7,r8
          5180:	54 e7 27 be 	rlwinm  r7,r7,4,30,31
          5184:	7d 20 51 e4 	mtsrin  r9,r10
          5188:	3d 4a 10 00 	addis   r10,r10,4096
          518c:	39 29 01 11 	addi    r9,r9,273
          5190:	7c 08 50 40 	cmplw   r8,r10
          5194:	55 29 02 06 	rlwinm  r9,r9,0,8,3
          5198:	40 81 00 9c 	ble     5234 <perf_copy_attr+0x48c>
      
          519c:	2c 07 00 00 	cmpwi   r7,0
          51a0:	41 82 00 4c 	beq     51ec <perf_copy_attr+0x444>
          51a4:	2c 07 00 01 	cmpwi   r7,1
          51a8:	41 82 00 2c 	beq     51d4 <perf_copy_attr+0x42c>
          51ac:	2c 07 00 02 	cmpwi   r7,2
          51b0:	41 82 00 14 	beq     51c4 <perf_copy_attr+0x41c>
          51b4:	7d 20 51 e4 	mtsrin  r9,r10
          51b8:	39 29 01 11 	addi    r9,r9,273
          51bc:	3d 4a 10 00 	addis   r10,r10,4096
          51c0:	55 29 02 06 	rlwinm  r9,r9,0,8,3
          51c4:	7d 20 51 e4 	mtsrin  r9,r10
          51c8:	39 29 01 11 	addi    r9,r9,273
          51cc:	3d 4a 10 00 	addis   r10,r10,4096
          51d0:	55 29 02 06 	rlwinm  r9,r9,0,8,3
          51d4:	7d 20 51 e4 	mtsrin  r9,r10
          51d8:	3d 4a 10 00 	addis   r10,r10,4096
          51dc:	39 29 01 11 	addi    r9,r9,273
          51e0:	7c 08 50 40 	cmplw   r8,r10
          51e4:	55 29 02 06 	rlwinm  r9,r9,0,8,3
          51e8:	40 81 00 4c 	ble     5234 <perf_copy_attr+0x48c>
          51ec:	7d 20 51 e4 	mtsrin  r9,r10
          51f0:	39 29 01 11 	addi    r9,r9,273
          51f4:	3c ea 10 00 	addis   r7,r10,4096
          51f8:	55 29 02 06 	rlwinm  r9,r9,0,8,3
          51fc:	7d 20 39 e4 	mtsrin  r9,r7
          5200:	39 29 01 11 	addi    r9,r9,273
          5204:	3c e7 10 00 	addis   r7,r7,4096
          5208:	55 29 02 06 	rlwinm  r9,r9,0,8,3
          520c:	7d 20 39 e4 	mtsrin  r9,r7
          5210:	39 29 01 11 	addi    r9,r9,273
          5214:	3c ea 30 00 	addis   r7,r10,12288
          5218:	55 29 02 06 	rlwinm  r9,r9,0,8,3
          521c:	7d 20 39 e4 	mtsrin  r9,r7
          5220:	3d 4a 40 00 	addis   r10,r10,16384
          5224:	39 29 01 11 	addi    r9,r9,273
          5228:	7c 08 50 40 	cmplw   r8,r10
          522c:	55 29 02 06 	rlwinm  r9,r9,0,8,3
          5230:	41 81 ff bc 	bgt     51ec <perf_copy_attr+0x444>
      
          5234:	4c 00 01 2c 	isync
      Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      [mpe: Export the ool handlers to fix build errors]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/d9121f96a7c4302946839a0771f5d1daeeb6968c.1622708530.git.christophe.leroy@csgroup.eu
      16132529
  8. 08 4月, 2021 1 次提交
  9. 29 3月, 2021 3 次提交
  10. 03 12月, 2020 4 次提交
  11. 06 10月, 2020 1 次提交
  12. 15 9月, 2020 1 次提交
  13. 09 9月, 2020 1 次提交
  14. 02 9月, 2020 4 次提交
  15. 22 7月, 2020 1 次提交
  16. 20 7月, 2020 1 次提交
  17. 02 6月, 2020 1 次提交
  18. 18 5月, 2020 3 次提交
  19. 15 5月, 2020 2 次提交
    • M
      powerpc: Drop unneeded cast in task_pt_regs() · 24ac99e9
      Michael Ellerman 提交于
      There's no need to cast in task_pt_regs() as tsk->thread.regs should
      already be a struct pt_regs. If someone's using task_pt_regs() on
      something that's not a task but happens to have a thread.regs then
      we'll deal with them later.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200428123152.73566-1-mpe@ellerman.id.au
      24ac99e9
    • M
      powerpc/64: Don't initialise init_task->thread.regs · 7ffa8b7d
      Michael Ellerman 提交于
      Aneesh increased the size of struct pt_regs by 16 bytes and started
      seeing this WARN_ON:
      
        smp: Bringing up secondary CPUs ...
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 0 at arch/powerpc/kernel/process.c:455 giveup_all+0xb4/0x110
        Modules linked in:
        CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc2-gcc-8.2.0-1.g8f6a41f-default+ #318
        NIP:  c00000000001a2b4 LR: c00000000001a29c CTR: c0000000031d0000
        REGS: c0000000026d3980 TRAP: 0700   Not tainted  (5.7.0-rc2-gcc-8.2.0-1.g8f6a41f-default+)
        MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 48048224  XER: 00000000
        CFAR: c000000000019cc8 IRQMASK: 1
        GPR00: c00000000001a264 c0000000026d3c20 c0000000026d7200 800000000280b033
        GPR04: 0000000000000001 0000000000000000 0000000000000077 30206d7372203164
        GPR08: 0000000000002000 0000000002002000 800000000280b033 3230303030303030
        GPR12: 0000000000008800 c0000000031d0000 0000000000800050 0000000002000066
        GPR16: 000000000309a1a0 000000000309a4b0 000000000309a2d8 000000000309a890
        GPR20: 00000000030d0098 c00000000264da40 00000000fd620000 c0000000ff798080
        GPR24: c00000000264edf0 c0000001007469f0 00000000fd620000 c0000000020e5e90
        GPR28: c00000000264edf0 c00000000264d200 000000001db60000 c00000000264d200
        NIP [c00000000001a2b4] giveup_all+0xb4/0x110
        LR [c00000000001a29c] giveup_all+0x9c/0x110
        Call Trace:
        [c0000000026d3c20] [c00000000001a264] giveup_all+0x64/0x110 (unreliable)
        [c0000000026d3c90] [c00000000001ae34] __switch_to+0x104/0x480
        [c0000000026d3cf0] [c000000000e0b8a0] __schedule+0x320/0x970
        [c0000000026d3dd0] [c000000000e0c518] schedule_idle+0x38/0x70
        [c0000000026d3df0] [c00000000019c7c8] do_idle+0x248/0x3f0
        [c0000000026d3e70] [c00000000019cbb8] cpu_startup_entry+0x38/0x40
        [c0000000026d3ea0] [c000000000011bb0] rest_init+0xe0/0xf8
        [c0000000026d3ed0] [c000000002004820] start_kernel+0x990/0x9e0
        [c0000000026d3f90] [c00000000000c49c] start_here_common+0x1c/0x400
      
      Which was unexpected. The warning is checking the thread.regs->msr
      value of the task we are switching from:
      
        usermsr = tsk->thread.regs->msr;
        ...
        WARN_ON((usermsr & MSR_VSX) && !((usermsr & MSR_FP) && (usermsr & MSR_VEC)));
      
      ie. if MSR_VSX is set then both of MSR_FP and MSR_VEC are also set.
      
      Dumping tsk->thread.regs->msr we see that it's: 0x1db60000
      
      Which is not a normal looking MSR, in fact the only valid bit is
      MSR_VSX, all the other bits are reserved in the current definition of
      the MSR.
      
      We can see from the oops that it was swapper/0 that we were switching
      from when we hit the warning, ie. init_task. So its thread.regs points
      to the base (high addresses) in init_stack.
      
      Dumping the content of init_task->thread.regs, with the members of
      pt_regs annotated (the 16 bytes larger version), we see:
      
        0000000000000000 c000000002780080    gpr[0]     gpr[1]
        0000000000000000 c000000002666008    gpr[2]     gpr[3]
        c0000000026d3ed0 0000000000000078    gpr[4]     gpr[5]
        c000000000011b68 c000000002780080    gpr[6]     gpr[7]
        0000000000000000 0000000000000000    gpr[8]     gpr[9]
        c0000000026d3f90 0000800000002200    gpr[10]    gpr[11]
        c000000002004820 c0000000026d7200    gpr[12]    gpr[13]
        000000001db60000 c0000000010aabe8    gpr[14]    gpr[15]
        c0000000010aabe8 c0000000010aabe8    gpr[16]    gpr[17]
        c00000000294d598 0000000000000000    gpr[18]    gpr[19]
        0000000000000000 0000000000001ff8    gpr[20]    gpr[21]
        0000000000000000 c00000000206d608    gpr[22]    gpr[23]
        c00000000278e0cc 0000000000000000    gpr[24]    gpr[25]
        000000002fff0000 c000000000000000    gpr[26]    gpr[27]
        0000000002000000 0000000000000028    gpr[28]    gpr[29]
        000000001db60000 0000000004750000    gpr[30]    gpr[31]
        0000000002000000 000000001db60000    nip        msr
        0000000000000000 0000000000000000    orig_r3    ctr
        c00000000000c49c 0000000000000000    link       xer
        0000000000000000 0000000000000000    ccr        softe
        0000000000000000 0000000000000000    trap       dar
        0000000000000000 0000000000000000    dsisr      result
        0000000000000000 0000000000000000    ppr        kuap
        0000000000000000 0000000000000000    pad[2]     pad[3]
      
      This looks suspiciously like stack frames, not a pt_regs. If we look
      closely we can see return addresses from the stack trace above,
      c000000002004820 (start_kernel) and c00000000000c49c (start_here_common).
      
      init_task->thread.regs is setup at build time in processor.h:
      
        #define INIT_THREAD  { \
        	.ksp = INIT_SP, \
        	.regs = (struct pt_regs *)INIT_SP - 1, /* XXX bogus, I think */ \
      
      The early boot code where we setup the initial stack is:
      
        LOAD_REG_ADDR(r3,init_thread_union)
      
        /* set up a stack pointer */
        LOAD_REG_IMMEDIATE(r1,THREAD_SIZE)
        add	r1,r3,r1
        li	r0,0
        stdu	r0,-STACK_FRAME_OVERHEAD(r1)
      
      Which creates a stack frame of size 112 bytes (STACK_FRAME_OVERHEAD).
      Which is far too small to contain a pt_regs.
      
      So the result is init_task->thread.regs is pointing at some stack
      frames on the init stack, not at a pt_regs.
      
      We have gotten away with this for so long because with pt_regs at its
      current size the MSR happens to point into the first frame, at a
      location that is not written to by the early asm. With the 16 byte
      expansion the MSR falls into the second frame, which is used by the
      compiler, and collides with a saved register that tends to be
      non-zero.
      
      As far as I can see this has been wrong since the original merge of
      64-bit ppc support, back in 2002.
      
      Conceptually swapper should have no regs, it never entered from
      userspace, and in fact that's what we do on 32-bit. It's also
      presumably what the "bogus" comment is referring to.
      
      So I think the right fix is to just not-initialise regs at all. I'm
      slightly worried this will break some code that isn't prepared for a
      NULL regs, but we'll have to see.
      
      Remove the comment in head_64.S which refers to us setting up the
      regs (even though we never did), and is otherwise not really accurate
      any more.
      Reported-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200428123130.73078-1-mpe@ellerman.id.au
      7ffa8b7d
  20. 20 4月, 2020 1 次提交
  21. 18 2月, 2020 1 次提交
    • C
      powerpc/32s: Fix DSI and ISI exceptions for CONFIG_VMAP_STACK · 232ca1ee
      Christophe Leroy 提交于
      hash_page() needs to read page tables from kernel memory. When entire
      kernel memory is mapped by BATs, which is normally the case when
      CONFIG_STRICT_KERNEL_RWX is not set, it works even if the page hosting
      the page table is not referenced in the MMU hash table.
      
      However, if the page where the page table resides is not covered by
      a BAT, a DSI fault can be encountered from hash_page(), and it loops
      forever. This can happen when CONFIG_STRICT_KERNEL_RWX is selected
      and the alignment of the different regions is too small to allow
      covering the entire memory with BATs. This also happens when
      CONFIG_DEBUG_PAGEALLOC is selected or when booting with 'nobats'
      flag.
      
      Also, if the page containing the kernel stack is not present in the
      MMU hash table, registers cannot be saved and a recursive DSI fault
      is encountered.
      
      To allow hash_page() to properly do its job at all time and load the
      MMU hash table whenever needed, it must run with data MMU disabled.
      This means it must be called before re-enabling data MMU. To allow
      this, registers clobbered by hash_page() and create_hpte() have to
      be saved in the thread struct together with SRR0, SSR1, DAR and DSISR.
      It is also necessary to ensure that DSI prolog doesn't overwrite
      regs saved by prolog of the current running exception. That means:
      - DSI can only use SPRN_SPRG_SCRATCH0
      - Exceptions must free SPRN_SPRG_SCRATCH0 before writing to the stack.
      
      This also fixes the Oops reported by Erhard when create_hpte() is
      called by add_hash_page().
      
      Due to prolog size increase, a few more exceptions had to get split
      in two parts.
      
      Fixes: cd08f109 ("powerpc/32s: Enable CONFIG_VMAP_STACK")
      Reported-by: NErhard F. <erhard_f@mailbox.org>
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Tested-by: NErhard F. <erhard_f@mailbox.org>
      Tested-by: NLarry Finger <Larry.Finger@lwfinger.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=206501
      Link: https://lore.kernel.org/r/64a4aa44686e9fd4b01333401367029771d9b231.1581761633.git.christophe.leroy@c-s.fr
      232ca1ee
  22. 26 1月, 2020 1 次提交
  23. 16 1月, 2020 1 次提交
  24. 15 6月, 2019 1 次提交
  25. 31 5月, 2019 1 次提交
  26. 30 4月, 2019 1 次提交
    • N
      powerpc/64s: Reimplement book3s idle code in C · 10d91611
      Nicholas Piggin 提交于
      Reimplement Book3S idle code in C, moving POWER7/8/9 implementation
      speific HV idle code to the powernv platform code.
      
      Book3S assembly stubs are kept in common code and used only to save
      the stack frame and non-volatile GPRs before executing architected
      idle instructions, and restoring the stack and reloading GPRs then
      returning to C after waking from idle.
      
      The complex logic dealing with threads and subcores, locking, SPRs,
      HMIs, timebase resync, etc., is all done in C which makes it more
      maintainable.
      
      This is not a strict translation to C code, there are some
      significant differences:
      
      - Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs,
        but saves and restores them itself.
      
      - The optimisation where EC=ESL=0 idle modes did not have to save GPRs
        or change MSR is restored, because it's now simple to do. ESL=1
        sleeps that do not lose GPRs can use this optimization too.
      
      - KVM secondary entry and cede is now more of a call/return style
        rather than branchy. nap_state_lost is not required because KVM
        always returns via NVGPR restoring path.
      
      - KVM secondary wakeup from offline sequence is moved entirely into
        the offline wakeup, which avoids a hwsync in the normal idle wakeup
        path.
      
      Performance measured with context switch ping-pong on different
      threads or cores, is possibly improved a small amount, 1-3% depending
      on stop state and core vs thread test for shallow states. Deep states
      it's in the noise compared with other latencies.
      
      KVM improvements:
      
      - Idle sleepers now always return to caller rather than branch out
        to KVM first.
      
      - This allows optimisations like very fast return to caller when no
        state has been lost.
      
      - KVM no longer requires nap_state_lost because it controls NVGPR
        save/restore itself on the way in and out.
      
      - The heavy idle wakeup KVM request check can be moved out of the
        normal host idle code and into the not-performance-critical offline
        code.
      
      - KVM nap code now returns from where it is called, which makes the
        flow a bit easier to follow.
      Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Squash the KVM changes in]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      10d91611
  27. 21 4月, 2019 1 次提交
    • C
      powerpc/32s: Implement Kernel Userspace Access Protection · a68c31fc
      Christophe Leroy 提交于
      This patch implements Kernel Userspace Access Protection for
      book3s/32.
      
      Due to limitations of the processor page protection capabilities,
      the protection is only against writing. read protection cannot be
      achieved using page protection.
      
      The previous patch modifies the page protection so that RW user
      pages are RW for Key 0 and RO for Key 1, and it sets Key 0 for
      both user and kernel.
      
      This patch changes userspace segment registers are set to Ku 0
      and Ks 1. When kernel needs to write to RW pages, the associated
      segment register is then changed to Ks 0 in order to allow write
      access to the kernel.
      
      In order to avoid having the read all segment registers when
      locking/unlocking the access, some data is kept in the thread_struct
      and saved on stack on exceptions. The field identifies both the
      first unlocked segment and the first segment following the last
      unlocked one. When no segment is unlocked, it contains value 0.
      
      As the hash_page() function is not able to easily determine if a
      protfault is due to a bad kernel access to userspace, protfaults
      need to be handled by handle_page_fault when KUAP is set.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      [mpe: Drop allow_read/write_to/from_user() as they're now in kup.h,
            and adapt allow_user_access() to do nothing when to == NULL]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a68c31fc