1. 31 10月, 2018 6 次提交
  2. 30 10月, 2018 1 次提交
  3. 29 10月, 2018 1 次提交
  4. 26 10月, 2018 10 次提交
  5. 23 10月, 2018 2 次提交
  6. 21 10月, 2018 3 次提交
  7. 20 10月, 2018 17 次提交
    • C
      powerpc/traps: restore recoverability of machine_check interrupts · daf00ae7
      Christophe Leroy 提交于
      commit b96672dd ("powerpc: Machine check interrupt is a non-
      maskable interrupt") added a call to nmi_enter() at the beginning of
      machine check restart exception handler. Due to that, in_interrupt()
      always returns true regardless of the state before entering the
      exception, and die() panics even when the system was not already in
      interrupt.
      
      This patch calls nmi_exit() before calling die() in order to restore
      the interrupt state we had before calling nmi_enter()
      
      Fixes: b96672dd ("powerpc: Machine check interrupt is a non-maskable interrupt")
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      daf00ae7
    • N
      powerpc/64/module: REL32 relocation range check · b851ba02
      Nicholas Piggin 提交于
      The recent module relocation overflow crash demonstrated that we
      have no range checking on REL32 relative relocations. This patch
      implements a basic check, the same kernel that previously oopsed
      and rebooted now continues with some of these errors when loading
      the module:
      
        module_64: x_tables: REL32 527703503449812 out of range!
      
      Possibly other relocations (ADDR32, REL16, TOC16, etc.) should also have
      overflow checks.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b851ba02
    • N
    • M
      selftests/powerpc: Add a test of wild bctr · b7683fc6
      Michael Ellerman 提交于
      This tests that a bctr (Branch to counter and link), ie. a function
      call, to a wildly out-of-bounds address is handled correctly.
      
      Some old kernel versions didn't handle it correctly, see eg:
      
        "powerpc/slb: Force a full SLB flush when we insert for a bad EA"
        https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-April/157397.htmlSigned-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b7683fc6
    • M
      powerpc/mm: Fix page table dump to work on Radix · 0d923962
      Michael Ellerman 提交于
      When we're running on Book3S with the Radix MMU enabled the page table
      dump currently prints the wrong addresses because it uses the wrong
      start address.
      
      Fix it to use PAGE_OFFSET rather than KERN_VIRT_START.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0d923962
    • M
      powerpc/mm/radix: Display if mappings are exec or not · afb6d064
      Michael Ellerman 提交于
      At boot we print the ranges we've mapped for the linear mapping and
      what page size we've used. Also track whether the range is mapped
      executable or not and display that as well.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      afb6d064
    • M
      powerpc/mm/radix: Simplify split mapping logic · 232aa407
      Michael Ellerman 提交于
      If we look closely at the logic in create_physical_mapping(), when
      we're doing STRICT_KERNEL_RWX, we do the following steps:
        - determine the gap from where we are to the end of the range
        - choose an appropriate mapping_size based on the gap
        - check if that mapping_size would overlap the __init_begin
          boundary, and if not choose an appropriate mapping_size
      
      We can simplify the logic by taking the __init_begin boundary into
      account when we calculate the initial gap.
      
      So add a next_boundary() function which tells us what the next
      boundary is, either the __init_begin boundary or end. In future we can
      add more boundaries.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      232aa407
    • M
      powerpc/mm/radix: Remove the retry in the split mapping logic · 57306c66
      Michael Ellerman 提交于
      When we have CONFIG_STRICT_KERNEL_RWX enabled, we want to split the
      linear mapping at the text/data boundary so we can map the kernel
      text read only.
      
      The current logic uses a goto inside the for loop, which works, but is
      hard to reason about.
      
      When we hit the goto retry case we set max_mapping_size to PMD_SIZE
      and go back to the start.
      
      Setting max_mapping_size means we skip the PUD case and go to the PMD
      case.
      
      We know we will pass the alignment and gap checks because the only
      reason we are there is we hit the goto retry, and that is guarded by
      mapping_size == PUD_SIZE, which means addr is PUD aligned and gap is
      greater or equal to PUD_SIZE.
      
      So the only part of the check that can fail is the mmu_psize_defs
      check for the 2M page size.
      
      If we just duplicate that check we can avoid the goto, and we get the
      same result.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      57306c66
    • M
      powerpc/mm/radix: Fix small page at boundary when splitting · 81d1b54d
      Michael Ellerman 提交于
      When we have CONFIG_STRICT_KERNEL_RWX enabled, we want to split the
      linear mapping at the text/data boundary so we can map the kernel
      text read only.
      
      Currently we always use a small page at the text/data boundary, even
      when that's not necessary:
      
        Mapped 0x0000000000000000-0x0000000000e00000 with 2.00 MiB pages
        Mapped 0x0000000000e00000-0x0000000001000000 with 64.0 KiB pages
        Mapped 0x0000000001000000-0x0000000040000000 with 2.00 MiB pages
      
      This is because the check that the mapping crosses the __init_begin
      boundary is too strict, it also returns true when we map exactly up to
      the boundary.
      
      So fix it to check that the mapping would actually map past
      __init_begin, and with that we see:
      
        Mapped 0x0000000000000000-0x0000000040000000 with 2.00 MiB pages
        Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      81d1b54d
    • M
      powerpc/mm/radix: Fix overuse of small pages in splitting logic · 3b5657ed
      Michael Ellerman 提交于
      When we have CONFIG_STRICT_KERNEL_RWX enabled, we want to split the
      linear mapping at the text/data boundary so we can map the kernel text
      read only.
      
      But the current logic uses small pages for the entire text section,
      regardless of whether a larger page size would fit. eg. with the
      boundary at 16M we could use 2M pages, but instead we use 64K pages up
      to the 16M boundary:
      
        Mapped 0x0000000000000000-0x0000000001000000 with 64.0 KiB pages
        Mapped 0x0000000001000000-0x0000000040000000 with 2.00 MiB pages
        Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
      
      This is because the test is checking if addr is < __init_begin
      and addr + mapping_size is >= _stext. But that is true for all pages
      between _stext and __init_begin.
      
      Instead what we want to check is if we are crossing the text/data
      boundary, which is at __init_begin. With that fixed we see:
      
        Mapped 0x0000000000000000-0x0000000000e00000 with 2.00 MiB pages
        Mapped 0x0000000000e00000-0x0000000001000000 with 64.0 KiB pages
        Mapped 0x0000000001000000-0x0000000040000000 with 2.00 MiB pages
        Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
      
      ie. we're correctly using 2MB pages below __init_begin, but we still
      drop down to 64K pages unnecessarily at the boundary.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3b5657ed
    • M
      powerpc/mm/radix: Fix off-by-one in split mapping logic · 5c6499b7
      Michael Ellerman 提交于
      When we have CONFIG_STRICT_KERNEL_RWX enabled, we try to split the
      kernel linear (1:1) mapping so that the kernel text is in a separate
      page to kernel data, so we can mark the former read-only.
      
      We could achieve that just by always using 64K pages for the linear
      mapping, but we try to be smarter. Instead we use huge pages when
      possible, and only switch to smaller pages when necessary.
      
      However we have an off-by-one bug in that logic, which causes us to
      calculate the wrong boundary between text and data.
      
      For example with the end of the kernel text at 16M we see:
      
        radix-mmu: Mapped 0x0000000000000000-0x0000000001200000 with 64.0 KiB pages
        radix-mmu: Mapped 0x0000000001200000-0x0000000040000000 with 2.00 MiB pages
        radix-mmu: Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
      
      ie. we mapped from 0 to 18M with 64K pages, even though the boundary
      between text and data is at 16M.
      
      With the fix we see we're correctly hitting the 16M boundary:
      
        radix-mmu: Mapped 0x0000000000000000-0x0000000001000000 with 64.0 KiB pages
        radix-mmu: Mapped 0x0000000001000000-0x0000000040000000 with 2.00 MiB pages
        radix-mmu: Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5c6499b7
    • N
      powerpc/ftrace: Handle large kernel configs · 67361cf8
      Naveen N. Rao 提交于
      Currently, we expect to be able to reach ftrace_caller() from all
      ftrace-enabled functions through a single relative branch. With large
      kernel configs, we see functions outside of 32MB of ftrace_caller()
      causing ftrace_init() to bail.
      
      In such configurations, gcc/ld emits two types of trampolines for mcount():
      1. A long_branch, which has a single branch to mcount() for functions that
         are one hop away from mcount():
      	c0000000019e8544 <00031b56.long_branch._mcount>:
      	c0000000019e8544:	4a 69 3f ac 	b       c00000000007c4f0 <._mcount>
      
      2. A plt_branch, for functions that are farther away from mcount():
      	c0000000051f33f8 <0008ba04.plt_branch._mcount>:
      	c0000000051f33f8:	3d 82 ff a4 	addis   r12,r2,-92
      	c0000000051f33fc:	e9 8c 04 20 	ld      r12,1056(r12)
      	c0000000051f3400:	7d 89 03 a6 	mtctr   r12
      	c0000000051f3404:	4e 80 04 20 	bctr
      
      We can reuse those trampolines for ftrace if we can have those
      trampolines go to ftrace_caller() instead. However, with ABIv2, we
      cannot depend on r2 being valid. As such, we use only the long_branch
      trampolines by patching those to instead branch to ftrace_caller or
      ftrace_regs_caller.
      
      In addition, we add additional trampolines around .text and .init.text
      to catch locations that are covered by the plt branches. This allows
      ftrace to work with most large kernel configurations.
      
      For now, we always patch the trampolines to go to ftrace_regs_caller,
      which is slightly inefficient. This can be optimized further at a later
      point.
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      67361cf8
    • A
      powerpc/mm: Fix WARN_ON with THP NUMA migration · dd0e144a
      Aneesh Kumar K.V 提交于
      WARNING: CPU: 12 PID: 4322 at /arch/powerpc/mm/pgtable-book3s64.c:76 set_pmd_at+0x4c/0x2b0
       Modules linked in:
       CPU: 12 PID: 4322 Comm: qemu-system-ppc Tainted: G        W         4.19.0-rc3-00758-g8f0c636b0542 #36
       NIP:  c0000000000872fc LR: c000000000484eec CTR: 0000000000000000
       REGS: c000003fba876fe0 TRAP: 0700   Tainted: G        W          (4.19.0-rc3-00758-g8f0c636b0542)
       MSR:  900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 24282884  XER: 00000000
       CFAR: c000000000484ee8 IRQMASK: 0
       GPR00: c000000000484eec c000003fba877268 c000000001f0ec00 c000003fbd229f80
       GPR04: 00007c8fe8e00000 c000003f864c5a38 860300853e0000c0 0000000000000080
       GPR08: 0000000080000000 0000000000000001 0401000000000080 0000000000000001
       GPR12: 0000000000002000 c000003fffff5400 c000003fce292000 00007c9024570000
       GPR16: 0000000000000000 0000000000ffffff 0000000000000001 c000000001885950
       GPR20: 0000000000000000 001ffffc0004807c 0000000000000008 c000000001f49d05
       GPR24: 00007c8fe8e00000 c0000000020f2468 ffffffffffffffff c000003fcd33b090
       GPR28: 00007c8fe8e00000 c000003fbd229f80 c000003f864c5a38 860300853e0000c0
       NIP [c0000000000872fc] set_pmd_at+0x4c/0x2b0
       LR [c000000000484eec] do_huge_pmd_numa_page+0xb1c/0xc20
       Call Trace:
       [c000003fba877268] [c00000000045931c] mpol_misplaced+0x1bc/0x230 (unreliable)
       [c000003fba8772c8] [c000000000484eec] do_huge_pmd_numa_page+0xb1c/0xc20
       [c000003fba877398] [c00000000040d344] __handle_mm_fault+0x5e4/0x2300
       [c000003fba8774d8] [c00000000040f400] handle_mm_fault+0x3a0/0x420
       [c000003fba877528] [c0000000003ff6f4] __get_user_pages+0x2e4/0x560
       [c000003fba877628] [c000000000400314] get_user_pages_unlocked+0x104/0x2a0
       [c000003fba8776c8] [c000000000118f44] __gfn_to_pfn_memslot+0x284/0x6a0
       [c000003fba877748] [c0000000001463a0] kvmppc_book3s_radix_page_fault+0x360/0x12d0
       [c000003fba877838] [c000000000142228] kvmppc_book3s_hv_page_fault+0x48/0x1300
       [c000003fba877988] [c00000000013dc08] kvmppc_vcpu_run_hv+0x1808/0x1b50
       [c000003fba877af8] [c000000000126b44] kvmppc_vcpu_run+0x34/0x50
       [c000003fba877b18] [c000000000123268] kvm_arch_vcpu_ioctl_run+0x288/0x2d0
       [c000003fba877b98] [c00000000011253c] kvm_vcpu_ioctl+0x1fc/0x8c0
       [c000003fba877d08] [c0000000004e9b24] do_vfs_ioctl+0xa44/0xae0
       [c000003fba877db8] [c0000000004e9c44] ksys_ioctl+0x84/0xf0
       [c000003fba877e08] [c0000000004e9cd8] sys_ioctl+0x28/0x80
      
      We removed the pte_protnone check earlier with the understanding that we
      mark the pte invalid before the set_pte/set_pmd usage. But the huge pmd
      autonuma still use the set_pmd_at directly. This is ok because a protnone pte
      won't have translation cache in TLB.
      
      Fixes: da7ad366 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      dd0e144a
    • M
      selftests/powerpc: Fix out-of-tree build errors · d8a2fe29
      Michael Ellerman 提交于
      Some of our Makefiles don't do the right thing when building the
      selftests with O=, fix them up.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d8a2fe29
    • C
      powerpc/time: no steal_time when CONFIG_PPC_SPLPAR is not selected · 51eeef9e
      Christophe Leroy 提交于
      If CONFIG_PPC_SPLPAR is not selected, steal_time will always
      be NUL, so accounting it is pointless
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      51eeef9e
    • C
      powerpc/time: Only set CONFIG_ARCH_HAS_SCALED_CPUTIME on PPC64 · abcff86d
      Christophe Leroy 提交于
      scaled cputime is only meaningfull when the processor has
      SPURR and/or PURR, which means only on PPC64.
      
      Removing it on PPC32 significantly reduces the size of
      vtime_account_system() and vtime_account_idle() on an 8xx:
      
      Before:
      00000000 l     F .text	000000a8 vtime_delta
      00000280 g     F .text	0000010c vtime_account_system
      0000038c g     F .text	00000048 vtime_account_idle
      
      After:
      (vtime_delta gets inlined inside the two functions)
      000001d8 g     F .text	000000a0 vtime_account_system
      00000278 g     F .text	00000038 vtime_account_idle
      
      In terms of performance, we also get approximatly 7% improvement on
      task switch. The following small benchmark app is run with perf stat:
      
      void *thread(void *arg)
      {
      	int i;
      
      	for (i = 0; i < atoi((char*)arg); i++)
      		pthread_yield();
      }
      
      int main(int argc, char **argv)
      {
      	pthread_t th1, th2;
      
      	pthread_create(&th1, NULL, thread, argv[1]);
      	pthread_create(&th2, NULL, thread, argv[1]);
      	pthread_join(th1, NULL);
      	pthread_join(th2, NULL);
      
      	return 0;
      }
      
      Before the patch:
      
       Performance counter stats for 'chrt -f 98 ./sched 100000' (50 runs):
      
             8228.476465      task-clock (msec)         #    0.954 CPUs utilized            ( +-  0.23% )
                  200004      context-switches          #    0.024 M/sec                    ( +-  0.00% )
      
      After the patch:
      
       Performance counter stats for 'chrt -f 98 ./sched 100000' (50 runs):
      
             7649.070444      task-clock (msec)         #    0.955 CPUs utilized            ( +-  0.27% )
                  200004      context-switches          #    0.026 M/sec                    ( +-  0.00% )
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      abcff86d
    • C
      powerpc/time: isolate scaled cputime accounting in dedicated functions. · b38a181c
      Christophe Leroy 提交于
      scaled cputime is only meaningfull when the processor has
      SPURR and/or PURR, which means only on PPC64.
      
      In preparation of the following patch that will remove
      CONFIG_ARCH_HAS_SCALED_CPUTIME on PPC32, this patch moves
      all scaled cputing accounting logic into dedicated functions.
      
      This patch doesn't change any functionality. It's only code
      reorganisation.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b38a181c