1. 27 6月, 2017 3 次提交
  2. 22 6月, 2017 1 次提交
    • P
      powerpc: Convert VDSO update function to use new update_vsyscall interface · d4cfb113
      Paul Mackerras 提交于
      This converts the powerpc VDSO time update function to use the new
      interface introduced in commit 576094b7 ("time: Introduce new
      GENERIC_TIME_VSYSCALL", 2012-09-11).  Where the old interface gave
      us the time as of the last update in seconds and whole nanoseconds,
      with the new interface we get the nanoseconds part effectively in
      a binary fixed-point format with tk->tkr_mono.shift bits to the
      right of the binary point.
      
      With the old interface, the fractional nanoseconds got truncated,
      meaning that the value returned by the VDSO clock_gettime function
      would have about 1ns of jitter in it compared to the value computed
      by the generic timekeeping code in the kernel.
      
      The powerpc VDSO time functions (clock_gettime and gettimeofday)
      already work in units of 2^-32 seconds, or 0.23283 ns, because that
      makes it simple to split the result into seconds and fractional
      seconds, and represent the fractional seconds in either microseconds
      or nanoseconds.  This is good enough accuracy for now, so this patch
      avoids changing how the VDSO works or the interface in the VDSO data
      page.
      
      This patch converts the powerpc update_vsyscall_old to be called
      update_vsyscall and use the new interface.  We convert the fractional
      second to units of 2^-32 seconds without truncating to whole nanoseconds.
      (There is still a conversion to whole nanoseconds for any legacy users
      of the vdso_data/systemcfg stamp_xtime field.)
      
      In addition, this improves the accuracy of the computation of tb_to_xs
      for those systems with high-frequency timebase clocks (>= 268.5 MHz)
      by doing the right shift in two parts, one before the multiplication and
      one after, rather than doing the right shift before the multiplication.
      (We can't do all of the right shift after the multiplication unless we
      use 128-bit arithmetic.)
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Acked-by: NJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d4cfb113
  3. 21 6月, 2017 4 次提交
  4. 20 6月, 2017 3 次提交
  5. 19 6月, 2017 7 次提交
  6. 15 6月, 2017 7 次提交
    • N
      powerpc/64s: Avoid cpabort in context switch when possible · 07d2a628
      Nicholas Piggin 提交于
      The ISA v3.0B copy-paste facility only requires cpabort when switching
      to a process that has foreign real addresses mapped (direct access to
      accelerators), to clear a potential copy buffer filled by a previous
      thread. There is no accelerator driver implemented yet, so cpabort can
      be removed. It can be be re-added when a driver is implemented.
      
      POWER9 DD1 requires the copy buffer to always be cleared on context
      switch, but if accelerators are not in use, then an unpaired copy from
      a dummy region is sufficient to clear data out of the copy buffer.
      
      This increases context switch performance by about 5% on POWER9.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      07d2a628
    • N
      powerpc/64: Drop explicit hwsync in context switch · 9145effd
      Nicholas Piggin 提交于
      The sync (aka. hwsync, aka. heavyweight sync) in the context switch
      code to prevent MMIO access being reordered from the point of view of
      a single process if it gets migrated to a different CPU is not
      required because there is an hwsync performed earlier in the context
      switch path.
      
      Comment this so it's clear enough if anything changes on the scheduler
      or the powerpc sides. Remove the hwsync from _switch.
      
      This improves context switch performance by 2-3% on POWER8.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9145effd
    • N
      powerpc/64: Drop reservation-clearing ldarx in context switch · 837e72f7
      Nicholas Piggin 提交于
      There is no need to explicitly break the reservation in _switch,
      because we are guaranteed that the context switch path will include a
      larx/stcx.
      
      Comment the guarantee and remove the reservation clear from _switch.
      
      This is worth 1-2% in context switch performance.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      837e72f7
    • N
      powerpc/64s: Leave interrupts hard enabled in context switch for radix · e4c0fc5f
      Nicholas Piggin 提交于
      Commit 4387e9ff25 ("[POWERPC] Fix PMU + soft interrupt disable bug")
      hard disabled interrupts over the low level context switch, because
      the SLB management can't cope with a PMU interrupt accesing the stack
      in that window.
      
      Radix based kernel mapping does not use the SLB so it does not require
      interrupts hard disabled here.
      
      This is worth 1-2% in context switch performance on POWER9.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e4c0fc5f
    • N
      powerpc/64: Avoid restore_math call if possible in syscall exit · bc4f65e4
      Nicholas Piggin 提交于
      The syscall exit code that branches to restore_math is quite heavy on
      Book3S, consisting of 2 mtmsr instructions. Threads that don't use both
      FP and vector can get caught here if the kernel ever uses FP or vector.
      Lazy-FP/vec context switching also trips this case.
      
      So check for lazy FP and vector before switching RI for restore_math.
      Move most of this case out of line.
      
      For threads that do want to restore math registers, the MSR switches are
      still suboptimal. Future direction may be to use a soft-RI bit to avoid
      MSR switches in kernel (similar to soft-EE), but for now at least the
      no-restore
      
      POWER9 context switch rate increases by about 5% due to sched_yield(2)
      return performance. I haven't constructed a test to measure the syscall
      cost.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bc4f65e4
    • N
      powerpc/64s: Optimize hypercall/syscall entry · acd7d8ce
      Nicholas Piggin 提交于
      After bc355125 ("powerpc/64: Allow for relocation-on interrupts from
      guest to host"), a getppid() system call goes from 307 cycles to 358
      cycles (+17%) on POWER8. This is due significantly to the scratch SPR
      used by the hypercall check.
      
      It turns out there are a some volatile registers common to both system
      call and hypercall (in particular, r12, cr0, ctr), which can be used to
      avoid the SPR and some other overheads. This brings getppid to 320 cycles
      (+4%).
      
      Testing hcall entry performance by running "sc 1" in guest userspace
      before this patch is 854 cycles, afterwards is 826. Also a small win
      there.
      
      POWER9 syscall is improved by about the same amount, hcall not tested.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      acd7d8ce
    • M
      Revert "powerpc: Handle simultaneous interrupts at once" · 0edc2ca9
      Michael Ellerman 提交于
      This reverts commit 45cb08f4.
      
      For some reason this is causing IRQ problems on Freescale Book3E
      machines, eg on my p5020ds:
      
        irq 25: nobody cared (try booting with the "irqpoll" option)
        CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc3-gcc-6.3.1-00037-g45cb08f4 #624
        Call Trace:
        [c0000000fffdbb10] [c00000000049962c] .dump_stack+0xa8/0xe8 (unreliable)
        [c0000000fffdbba0] [c0000000000babf4] .__report_bad_irq+0x54/0x140
        [c0000000fffdbc40] [c0000000000bb11c] .note_interrupt+0x324/0x380
        [c0000000fffdbd00] [c0000000000b7110] .handle_irq_event_percpu+0x68/0x88
        [c0000000fffdbd90] [c0000000000b718c] .handle_irq_event+0x5c/0xa8
        [c0000000fffdbe10] [c0000000000bc01c] .handle_fasteoi_irq+0xe4/0x298
        [c0000000fffdbe90] [c0000000000b59c4] .generic_handle_irq+0x50/0x74
        [c0000000fffdbf10] [c0000000000075d8] .__do_irq+0x74/0x1f0
        [c0000000fffdbf90] [c0000000000189f8] .call_do_irq+0x14/0x24
        [c0000000f7173060] [c0000000000077e4] .do_IRQ+0x90/0x120
        [c0000000f7173100] [c00000000001d93c] exc_0x500_common+0xfc/0x100
        --- interrupt: 501 at .prepare_to_wait_event+0xc/0x14c
            LR = .fsl_elbc_run_command+0xc8/0x23c
        [c0000000f71734d0] [c00000000065f418] .nand_reset+0xb8/0x168
        [c0000000f7173560] [c00000000065fec4] .nand_scan_ident+0x2b0/0x1638
        [c0000000f7173650] [c000000000666cd8] .fsl_elbc_nand_probe+0x34c/0x5f0
        ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
        [c0000000f7173750] [c0000000005a3c60] .platform_drv_probe+0x64/0xb0
        [c0000000f71737d0] [c0000000005a12e0] .really_probe+0x290/0x334
        [c0000000f7173870] [c0000000005a14a0] .__driver_attach+0x11c/0x120
        [c0000000f7173900] [c00000000059e6a0] .bus_for_each_dev+0x98/0xfc
        [c0000000f71739a0] [c0000000005a0b3c] .driver_attach+0x34/0x4c
        [c0000000f7173a20] [c0000000005a04b0] .bus_add_driver+0x1ac/0x2e0
        [c0000000f7173ac0] [c0000000005a2170] .driver_register+0x94/0x160
        [c0000000f7173b40] [c0000000005a3be0] .__platform_driver_register+0x60/0x7c
        [c0000000f7173bc0] [c000000000d6aab4] .fsl_elbc_nand_driver_init+0x24/0x38
        [c0000000f7173c30] [c000000000001934] .do_one_initcall+0x68/0x1b8
        [c0000000f7173d00] [c000000000d210f8] .kernel_init_freeable+0x260/0x338
        [c0000000f7173db0] [c0000000000021b0] .kernel_init+0x20/0xe70
        [c0000000f7173e30] [c0000000000009bc] .ret_from_kernel_thread+0x58/0x9c
        handlers:
        [<c000000000ed85c8>] .fsl_lbc_ctrl_irq
        Disabling IRQ #25
      
      Ben also had concerns with the implementation being potentially slow on
      some PICs, so revert it for now.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0edc2ca9
  7. 06 6月, 2017 1 次提交
    • N
      powerpc/64s: Machine check handle ifetch from foreign real address for POWER9 · 90df4bfb
      Nicholas Piggin 提交于
      The i-side 0111b machine check, which is "Instruction Fetch to foreign
      address space", was missed by 7b9f71f9 ("powerpc/64s: POWER9 machine
      check handler").
      
          The POWER9 processor core considers host real addresses with a
          nonzero value in RA(8:12) as foreign address space, accessible only
          by the copy and paste instructions. The copy and paste instruction
          pair can be used to invoke the Nest accelerators via the Virtual
          Accelerator Switchboard (VAS).
      
      It is an error for any regular load/store or ifetch to go to a foreign
      addresses. When relocation is on, this causes an MMU exception. When
      relocation is off, a machine check exception. It is possible to trigger
      this machine check by branching to a foreign address with MSR[IR]=0.
      
      Fixes: 7b9f71f9 ("powerpc/64s: POWER9 machine check handler")
      Reported-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      90df4bfb
  8. 02 6月, 2017 6 次提交
  9. 30 5月, 2017 8 次提交
    • N
      powerpc: Link warning for orphan sections · 83a092cf
      Nicholas Piggin 提交于
      Add --orphan-handling=warn to final link flags. This ensures we can
      handle all sections explicitly. This would have caught subtle breakage
      such as 7de3b27b at build-time.
      
      Also bring existing orphan sections into the fold:
      - .text.hot and .text.unlikely are compiler generated sections.
      - .sdata2, .dynsbss, .plt are used by PPC32
      - We previously did not specify DWARF_DEBUG or STABS_DEBUG
      - DWARF_DEBUG did not include all DWARF sections that can be emitted
      - A number of sections are unused and can be discarded.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      83a092cf
    • N
      powerpc/64: Tool to check head sections location sanity · c494adef
      Nicholas Piggin 提交于
      Use a tool to check that the location of "fixed sections" are where
      we expected them to be, which catches cases the linker script can't
      (stubs being added to start of .text section), and which ends up
      being neater.
      
      Sample output:
      
        ERROR: start_text address is c000000000008100, should be c000000000008000
        ERROR: see comments in arch/powerpc/tools/head_check.sh
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Fold in fix from Nick for 4.6 era toolchains]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c494adef
    • N
      powerpc/64: Handle linker stubs in low .text code · 951eedeb
      Nicholas Piggin 提交于
      Very large kernels may require linker stubs for branches from HEAD
      text code. The linker may place these stubs before the HEAD text
      sections, which breaks the assumption that HEAD text is located at 0
      (or the .text section being located at 0x7000/0x8000 on Book3S
      kernels).
      
      Provide an option to create a small section just before the .text
      section with an empty 256 - 4 bytes, and adjust the start of the .text
      section to match. The linker will tend to put stubs in that section
      and not break our relative-to-absolute offset assumptions.
      
      This causes a small waste of space on common kernels, but allows large
      kernels to build and boot. For now, it is an EXPERT config option,
      defaulting to =n, but a reference is provided for it in the build-time
      check for such breakage. This is good enough for allyesconfig and
      custom users / hackers.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      951eedeb
    • N
    • I
      powerpc/[booke|4xx]: Don't clobber TCR[WP] when setting TCR[DIE] · 6e2f03e2
      Ivan Mikhaylov 提交于
      Prevent a kernel panic caused by unintentionally clearing TCR watchdog
      bits. At this point in the kernel boot, the watchdog may have already
      been enabled by u-boot. The original code's attempt to write to the TCR
      register results in an inadvertent clearing of the watchdog
      configuration bits, causing the 476 to reset.
      Signed-off-by: NIvan Mikhaylov <ivan@de.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6e2f03e2
    • G
      powerpc/powernv/idle: Use Requested Level for restoring state on P9 DD1 · 22c6663d
      Gautham R. Shenoy 提交于
      On Power9 DD1 due to a hardware bug the Power-Saving Level Status
      field (PLS) of the PSSCR for a thread waking up from a deep state can
      under-report if some other thread in the core is in a shallow stop
      state. The scenario in which this can manifest is as follows:
      
         1) All the threads of the core are in deep stop.
         2) One of the threads is woken up. The PLS for this thread will
            correctly reflect that it is waking up from deep stop.
         3) The thread that has woken up now executes a shallow stop.
         4) When some other thread in the core is woken, its PLS will reflect
            the shallow stop state.
      
      Thus, the subsequent thread for which the PLS is under-reporting the
      wakeup state will not restore the hypervisor resources.
      
      Hence, on DD1 systems, use the Requested Level (RL) field as a
      workaround to restore the contents of the hypervisor resources on the
      wakeup from the stop state.
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      22c6663d
    • G
      powerpc/powernv/idle: Restore LPCR on wakeup from deep-stop · cb0be7ec
      Gautham R. Shenoy 提交于
      On wakeup from a deep stop state which is supposed to lose the
      hypervisor state, we don't restore the LPCR to the old value but set
      it to a "sane" value via cur_cpu_spec->cpu_restore().
      
      The problem is that the "sane" value doesn't include UPRT and the HR
      bits which are required to run correctly in Radix mode.
      
      Fix this on POWER9 onwards by restoring the LPCR value whatever it was
      before executing the stop instruction.
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cb0be7ec
    • G
      powerpc/powernv/idle: Decouple Timebase restore & Per-core SPRs restore · ec486735
      Gautham R. Shenoy 提交于
      On POWER8, in case of
         -  nap: both timebase and hypervisor state is retained.
         -  fast-sleep: timebase is lost. But the hypervisor state is retained.
         -  winkle: timebase and hypervisor state is lost.
      
      Hence, the current code for handling exit from a idle state assumes
      that if the timebase value is retained, then so is the hypervisor
      state. Thus, the current code doesn't restore per-core hypervisor
      state in such cases.
      
      But that is no longer the case on POWER9 where we do have stop states
      in which timebase value is retained, but the hypervisor state is
      lost. So we have to ensure that the per-core hypervisor state gets
      restored in such cases.
      
      Fix this by ensuring that even in the case when timebase is retained,
      we explicitly check if we are waking up from a deep stop that loses
      per-core hypervisor state (indicated by cr4 being eq or gt), and if
      this is the case, we restore the per-core hypervisor state.
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ec486735