1. 23 2月, 2019 8 次提交
    • C
      powerpc: Activate CONFIG_THREAD_INFO_IN_TASK · ed1cd6de
      Christophe Leroy 提交于
      This patch activates CONFIG_THREAD_INFO_IN_TASK which
      moves the thread_info into task_struct.
      
      Moving thread_info into task_struct has the following advantages:
        - It protects thread_info from corruption in the case of stack
          overflows.
        - Its address is harder to determine if stack addresses are leaked,
          making a number of attacks more difficult.
      
      This has the following consequences:
        - thread_info is now located at the beginning of task_struct.
        - The 'cpu' field is now in task_struct, and only exists when
          CONFIG_SMP is active.
        - thread_info doesn't have anymore the 'task' field.
      
      This patch:
        - Removes all recopy of thread_info struct when the stack changes.
        - Changes the CURRENT_THREAD_INFO() macro to point to current.
        - Selects CONFIG_THREAD_INFO_IN_TASK.
        - Modifies raw_smp_processor_id() to get ->cpu from current without
          including linux/sched.h to avoid circular inclusion and without
          including asm/asm-offsets.h to avoid symbol names duplication
          between ASM constants and C constants.
        - Modifies klp_init_thread_info() to take a task_struct pointer
          argument.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Add task_stack.h to livepatch.h to fix build fails]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ed1cd6de
    • A
      powerpc: Enable kcov · fb0b0a73
      Andrew Donnellan 提交于
      kcov provides kernel coverage data that's useful for fuzzing tools like
      syzkaller.
      
      Wire up kcov support on powerpc. Disable kcov instrumentation on the same
      files where we currently disable gcov and UBSan instrumentation, plus some
      additional exclusions which appear necessary to boot on book3e machines.
      Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: NDmitry Vyukov <dvyukov@google.com>
      Tested-by: Daniel Axtens <dja@axtens.net> # e6500
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fb0b0a73
    • C
      powerpc/kconfig: make _etext and data areas alignment configurable on 8xx · 8f54a6f7
      Christophe Leroy 提交于
      On 8xx, large pages (512kb or 8M) are used to map kernel linear
      memory. Aligning to 8M reduces TLB misses as only 8M pages are used
      in that case. We make 8M the default for data.
      
      This patchs allows the user to do it via Kconfig.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8f54a6f7
    • C
      powerpc/8xx: don't disable large TLBs with CONFIG_STRICT_KERNEL_RWX · d5f17ee9
      Christophe Leroy 提交于
      This patch implements handling of STRICT_KERNEL_RWX with
      large TLBs directly in the TLB miss handlers.
      
      To do so, etext and sinittext are aligned on 512kB boundaries
      and the miss handlers use 512kB pages instead of 8Mb pages for
      addresses close to the boundaries.
      
      It sets RO PP flags for addresses under sinittext.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d5f17ee9
    • C
      powerpc/kconfig: make _etext and data areas alignment configurable on Book3s 32 · 0f4a9041
      Christophe Leroy 提交于
      Depending on the number of available BATs for mapping the different
      kernel areas, it might be needed to increase the alignment of _etext
      and/or of data areas.
      
      This patchs allows the user to do it via Kconfig.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0f4a9041
    • C
      powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX · 63b2bc61
      Christophe Leroy 提交于
      Today, STRICT_KERNEL_RWX is based on the use of regular pages
      to map kernel pages.
      
      On Book3s 32, it has three consequences:
      - Using pages instead of BAT for mapping kernel linear memory severely
      impacts performance.
      - Exec protection is not effective because no-execute cannot be set at
      page level (except on 603 which doesn't have hash tables)
      - Write protection is not effective because PP bits do not provide RO
      mode for kernel-only pages (except on 603 which handles it in software
      via PAGE_DIRTY)
      
      On the 603+, we have:
      - Independent IBAT and DBAT allowing limitation of exec parts.
      - NX bit can be set in segment registers to forbit execution on memory
      mapped by pages.
      - RO mode on DBATs even for kernel-only blocks.
      
      On the 601, there is nothing much we can do other than warn the user
      about it, because:
      - BATs are common to instructions and data.
      - BAT do not provide RO mode for kernel-only blocks.
      - segment registers don't have the NX bit.
      
      In order to use IBAT for exec protection, this patch:
      - Aligns _etext to BAT block sizes (128kb)
      - Set NX bit in kernel segment register (Except on vmalloc area when
      CONFIG_MODULES is selected)
      - Maps kernel text with IBATs.
      
      In order to use DBAT for exec protection, this patch:
      - Aligns RW DATA to BAT block sizes (4M)
      - Maps kernel RO area with write prohibited DBATs
      - Maps remaining memory with remaining DBATs
      
      Here is what we get with this patch on a 832x when activating
      STRICT_KERNEL_RWX:
      
      Symbols:
      c0000000 T _stext
      c0680000 R __start_rodata
      c0680000 R _etext
      c0800000 T __init_begin
      c0800000 T _sinittext
      
      ~# cat /sys/kernel/debug/block_address_translation
      ---[ Instruction Block Address Translation ]---
      0: 0xc0000000-0xc03fffff 0x00000000 Kernel EXEC coherent
      1: 0xc0400000-0xc05fffff 0x00400000 Kernel EXEC coherent
      2: 0xc0600000-0xc067ffff 0x00600000 Kernel EXEC coherent
      3:         -
      4:         -
      5:         -
      6:         -
      7:         -
      
      ---[ Data Block Address Translation ]---
      0: 0xc0000000-0xc07fffff 0x00000000 Kernel RO coherent
      1: 0xc0800000-0xc0ffffff 0x00800000 Kernel RW coherent
      2: 0xc1000000-0xc1ffffff 0x01000000 Kernel RW coherent
      3: 0xc2000000-0xc3ffffff 0x02000000 Kernel RW coherent
      4: 0xc4000000-0xc7ffffff 0x04000000 Kernel RW coherent
      5: 0xc8000000-0xcfffffff 0x08000000 Kernel RW coherent
      6: 0xd0000000-0xdfffffff 0x10000000 Kernel RW coherent
      7:         -
      
      ~# cat /sys/kernel/debug/segment_registers
      ---[ User Segments ]---
      0x00000000-0x0fffffff Kern key 1 User key 1 VSID 0xa085d0
      0x10000000-0x1fffffff Kern key 1 User key 1 VSID 0xa086e1
      0x20000000-0x2fffffff Kern key 1 User key 1 VSID 0xa087f2
      0x30000000-0x3fffffff Kern key 1 User key 1 VSID 0xa08903
      0x40000000-0x4fffffff Kern key 1 User key 1 VSID 0xa08a14
      0x50000000-0x5fffffff Kern key 1 User key 1 VSID 0xa08b25
      0x60000000-0x6fffffff Kern key 1 User key 1 VSID 0xa08c36
      0x70000000-0x7fffffff Kern key 1 User key 1 VSID 0xa08d47
      0x80000000-0x8fffffff Kern key 1 User key 1 VSID 0xa08e58
      0x90000000-0x9fffffff Kern key 1 User key 1 VSID 0xa08f69
      0xa0000000-0xafffffff Kern key 1 User key 1 VSID 0xa0907a
      0xb0000000-0xbfffffff Kern key 1 User key 1 VSID 0xa0918b
      
      ---[ Kernel Segments ]---
      0xc0000000-0xcfffffff Kern key 0 User key 1 No Exec VSID 0x000ccc
      0xd0000000-0xdfffffff Kern key 0 User key 1 No Exec VSID 0x000ddd
      0xe0000000-0xefffffff Kern key 0 User key 1 No Exec VSID 0x000eee
      0xf0000000-0xffffffff Kern key 0 User key 1 No Exec VSID 0x000fff
      
      Aligning _etext to 128kb allows to map up to 32Mb text with 8 IBATs:
      16Mb + 8Mb + 4Mb + 2Mb + 1Mb + 512kb + 256kb + 128kb (+ 128kb) = 32Mb
      (A 9th IBAT is unneeded as 32Mb would need only a single 32Mb block)
      
      Aligning data to 4M allows to map up to 512Mb data with 8 DBATs:
      16Mb + 8Mb + 4Mb + 4Mb + 32Mb + 64Mb + 128Mb + 256Mb = 512Mb
      
      Because some processors only have 4 BATs and because some targets need
      DBATs for mapping other areas, the following patch will allow to
      modify _etext and data alignment.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      63b2bc61
    • C
      powerpc/kconfig: define CONFIG_DATA_SHIFT and CONFIG_ETEXT_SHIFT · 166d97d9
      Christophe Leroy 提交于
      CONFIG_STRICT_KERNEL_RWX requires a special alignment
      for DATA for some subarches. Today it is just defined
      as an #ifdef in vmlinux.lds.S
      
      In order to get more flexibility, this patch moves the
      definition of this alignment in Kconfig
      
      On some subarches, CONFIG_STRICT_KERNEL_RWX will
      require a special alignment of _etext.
      
      This patch also adds a configuration item for it in Kconfig
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      166d97d9
    • C
      powerpc/kconfig: define PAGE_SHIFT inside Kconfig · 555f4fdb
      Christophe Leroy 提交于
      This patch defined CONFIG_PPC_PAGE_SHIFT in order
      to be able to use PAGE_SHIFT value inside Kconfig.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      555f4fdb
  2. 21 2月, 2019 2 次提交
  3. 18 2月, 2019 2 次提交
  4. 04 2月, 2019 1 次提交
  5. 01 2月, 2019 1 次提交
  6. 31 1月, 2019 1 次提交
  7. 21 12月, 2018 2 次提交
  8. 20 12月, 2018 1 次提交
    • C
      powerpc: use mm zones more sensibly · 25078dc1
      Christoph Hellwig 提交于
      Powerpc has somewhat odd usage where ZONE_DMA is used for all memory on
      common 64-bit configfs, and ZONE_DMA32 is used for 31-bit schemes.
      
      Move to a scheme closer to what other architectures use (and I dare to
      say the intent of the system):
      
       - ZONE_DMA: optionally for memory < 31-bit (64-bit embedded only)
       - ZONE_NORMAL: everything addressable by the kernel
       - ZONE_HIGHMEM: memory > 32-bit for 32-bit kernels
      
      Also provide information on how ZONE_DMA is used by defining
      ARCH_ZONE_DMA_BITS.
      
      Contains various fixes from Benjamin Herrenschmidt.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      25078dc1
  9. 19 12月, 2018 1 次提交
    • C
      powerpc: implement CONFIG_DEBUG_VIRTUAL · 6bf752da
      Christophe Leroy 提交于
      This patch implements CONFIG_DEBUG_VIRTUAL to warn about
      incorrect use of virt_to_phys() and page_to_phys()
      
      Below is the result of test_debug_virtual:
      
      [    1.438746] WARNING: CPU: 0 PID: 1 at ./arch/powerpc/include/asm/io.h:808 test_debug_virtual_init+0x3c/0xd4
      [    1.448156] CPU: 0 PID: 1 Comm: swapper Not tainted 4.20.0-rc5-00560-g6bfb52e23a00-dirty #532
      [    1.457259] NIP:  c066c550 LR: c0650ccc CTR: c066c514
      [    1.462257] REGS: c900bdb0 TRAP: 0700   Not tainted  (4.20.0-rc5-00560-g6bfb52e23a00-dirty)
      [    1.471184] MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 48000422  XER: 20000000
      [    1.477811]
      [    1.477811] GPR00: c0650ccc c900be60 c60d0000 00000000 006000c0 c9000000 00009032 c7fa0020
      [    1.477811] GPR08: 00002400 00000001 09000000 00000000 c07b5d04 00000000 c00037d8 00000000
      [    1.477811] GPR16: 00000000 00000000 00000000 00000000 c0760000 c0740000 00000092 c0685bb0
      [    1.477811] GPR24: c065042c c068a734 c0685b8c 00000006 00000000 c0760000 c075c3c0 ffffffff
      [    1.512711] NIP [c066c550] test_debug_virtual_init+0x3c/0xd4
      [    1.518315] LR [c0650ccc] do_one_initcall+0x8c/0x1cc
      [    1.523163] Call Trace:
      [    1.525595] [c900be60] [c0567340] 0xc0567340 (unreliable)
      [    1.530954] [c900be90] [c0650ccc] do_one_initcall+0x8c/0x1cc
      [    1.536551] [c900bef0] [c0651000] kernel_init_freeable+0x1f4/0x2cc
      [    1.542658] [c900bf30] [c00037ec] kernel_init+0x14/0x110
      [    1.547913] [c900bf40] [c000e1d0] ret_from_kernel_thread+0x14/0x1c
      [    1.553971] Instruction dump:
      [    1.556909] 3ca50100 bfa10024 54a5000e 3fa0c076 7c0802a6 3d454000 813dc204 554893be
      [    1.564566] 7d294010 7d294910 90010034 39290001 <0f090000> 7c3e0b78 955e0008 3fe0c062
      [    1.572425] ---[ end trace 6f6984225b280ad6 ]---
      [    1.577467] PA: 0x09000000 for VA: 0xc9000000
      [    1.581799] PA: 0x061e8f50 for VA: 0xc61e8f50
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6bf752da
  10. 06 12月, 2018 1 次提交
  11. 04 12月, 2018 2 次提交
    • C
      powerpc/8xx: reintroduce 16K pages with HW assistance · 55c8fc3f
      Christophe Leroy 提交于
      Using this HW assistance implies some constraints on the
      page table structure:
      - Regardless of the main page size used (4k or 16k), the
      level 1 table (PGD) contains 1024 entries and each PGD entry covers
      a 4Mbytes area which is managed by a level 2 table (PTE) containing
      also 1024 entries each describing a 4k page.
      - 16k pages require 4 identifical entries in the L2 table
      - 512k pages PTE have to be spread every 128 bytes in the L2 table
      - 8M pages PTE are at the address pointed by the L1 entry and each
      8M page require 2 identical entries in the PGD.
      
      In order to use hardware assistance with 16K pages, this patch does
      the following modifications:
      - Make PGD size independent of the main page size
      - In 16k pages mode, redefine pte_t as a struct with 4 elements,
      and populate those 4 elements in __set_pte_at() and pte_update()
      - Adapt the size of the hugepage tables.
      - Define a PTE_FRAGMENT_NB so that a 16k page contains 4 page tables.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      55c8fc3f
    • C
      powerpc/8xx: Temporarily disable 16k pages and hugepages · 5af543be
      Christophe Leroy 提交于
      In preparation of making use of hardware assistance in TLB handlers,
      this patch temporarily disables 16K pages and hugepages. The reason
      is that when using HW assistance in 4K pages mode, the linux model
      fit with the HW model for 4K pages and 8M pages.
      
      However for 16K pages and 512K mode some additional work is needed
      to get linux model fit with HW model.
      For the 8M pages, they will naturaly come back when we switch to
      HW assistance, without any additional handling.
      In order to keep the following patch smaller, the removal of the
      current special handling for 8M pages gets removed here as well.
      
      Therefore the 4K pages mode will be implemented first and without
      support for 512k hugepages. Then the 512k hugepages will be brought
      back. And the 16K pages will be implemented in the following step.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5af543be
  12. 26 11月, 2018 2 次提交
  13. 23 11月, 2018 6 次提交
  14. 01 11月, 2018 2 次提交
  15. 31 10月, 2018 2 次提交
  16. 20 10月, 2018 2 次提交
    • C
      powerpc/time: Only set CONFIG_ARCH_HAS_SCALED_CPUTIME on PPC64 · abcff86d
      Christophe Leroy 提交于
      scaled cputime is only meaningfull when the processor has
      SPURR and/or PURR, which means only on PPC64.
      
      Removing it on PPC32 significantly reduces the size of
      vtime_account_system() and vtime_account_idle() on an 8xx:
      
      Before:
      00000000 l     F .text	000000a8 vtime_delta
      00000280 g     F .text	0000010c vtime_account_system
      0000038c g     F .text	00000048 vtime_account_idle
      
      After:
      (vtime_delta gets inlined inside the two functions)
      000001d8 g     F .text	000000a0 vtime_account_system
      00000278 g     F .text	00000038 vtime_account_idle
      
      In terms of performance, we also get approximatly 7% improvement on
      task switch. The following small benchmark app is run with perf stat:
      
      void *thread(void *arg)
      {
      	int i;
      
      	for (i = 0; i < atoi((char*)arg); i++)
      		pthread_yield();
      }
      
      int main(int argc, char **argv)
      {
      	pthread_t th1, th2;
      
      	pthread_create(&th1, NULL, thread, argv[1]);
      	pthread_create(&th2, NULL, thread, argv[1]);
      	pthread_join(th1, NULL);
      	pthread_join(th2, NULL);
      
      	return 0;
      }
      
      Before the patch:
      
       Performance counter stats for 'chrt -f 98 ./sched 100000' (50 runs):
      
             8228.476465      task-clock (msec)         #    0.954 CPUs utilized            ( +-  0.23% )
                  200004      context-switches          #    0.024 M/sec                    ( +-  0.00% )
      
      After the patch:
      
       Performance counter stats for 'chrt -f 98 ./sched 100000' (50 runs):
      
             7649.070444      task-clock (msec)         #    0.955 CPUs utilized            ( +-  0.27% )
                  200004      context-switches          #    0.026 M/sec                    ( +-  0.00% )
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      abcff86d
    • N
      powerpc: Add support for function error injection · 7cd01b08
      Naveen N. Rao 提交于
      We implement regs_set_return_value() and override_function_with_return()
      for this purpose.
      
      On powerpc, a return from a function (blr) just branches to the location
      contained in the link register. So, we can just update pt_regs rather
      than redirecting execution to a dummy function that returns.
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Reviewed-by: NSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7cd01b08
  17. 13 10月, 2018 2 次提交
  18. 03 10月, 2018 2 次提交
    • C
      powerpc/64: add stack protector support · 06ec27ae
      Christophe Leroy 提交于
      On PPC64, as register r13 points to the paca_struct at all time,
      this patch adds a copy of the canary there, which is copied at
      task_switch.
      That new canary is then used by using the following GCC options:
      -mstack-protector-guard=tls
      -mstack-protector-guard-reg=r13
      -mstack-protector-guard-offset=offsetof(struct paca_struct, canary))
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      06ec27ae
    • C
      powerpc/32: add stack protector support · c3ff2a51
      Christophe Leroy 提交于
      This functionality was tentatively added in the past
      (commit 6533b7c1 ("powerpc: Initial stack protector
      (-fstack-protector) support")) but had to be reverted
      (commit f2574030 ("powerpc: Revert the initial stack
      protector support") because of GCC implementing it differently
      whether it had been built with libc support or not.
      
      Now, GCC offers the possibility to manually set the
      stack-protector mode (global or tls) regardless of libc support.
      
      This time, the patch selects HAVE_STACKPROTECTOR only if
      -mstack-protector-guard=tls is supported by GCC.
      
      On PPC32, as register r2 points to current task_struct at
      all time, the stack_canary located inside task_struct can be
      used directly by using the following GCC options:
      -mstack-protector-guard=tls
      -mstack-protector-guard-reg=r2
      -mstack-protector-guard-offset=offsetof(struct task_struct, stack_canary))
      
      The protector is disabled for prom_init and bootx_init as
      it is too early to handle it properly.
      
       $ echo CORRUPT_STACK > /sys/kernel/debug/provoke-crash/DIRECT
      [  134.943666] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: lkdtm_CORRUPT_STACK+0x64/0x64
      [  134.943666]
      [  134.955414] CPU: 0 PID: 283 Comm: sh Not tainted 4.18.0-s3k-dev-12143-ga3272be41209 #835
      [  134.963380] Call Trace:
      [  134.965860] [c6615d60] [c001f76c] panic+0x118/0x260 (unreliable)
      [  134.971775] [c6615dc0] [c001f654] panic+0x0/0x260
      [  134.976435] [c6615dd0] [c032c368] lkdtm_CORRUPT_STACK_STRONG+0x0/0x64
      [  134.982769] [c6615e00] [ffffffff] 0xffffffff
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c3ff2a51