1. 17 8月, 2017 1 次提交
    • M
      membarrier: Provide expedited private command · 22e4ebb9
      Mathieu Desnoyers 提交于
      Implement MEMBARRIER_CMD_PRIVATE_EXPEDITED with IPIs using cpumask built
      from all runqueues for which current thread's mm is the same as the
      thread calling sys_membarrier. It executes faster than the non-expedited
      variant (no blocking). It also works on NOHZ_FULL configurations.
      
      Scheduler-wise, it requires a memory barrier before and after context
      switching between processes (which have different mm). The memory
      barrier before context switch is already present. For the barrier after
      context switch:
      
      * Our TSO archs can do RELEASE without being a full barrier. Look at
        x86 spin_unlock() being a regular STORE for example.  But for those
        archs, all atomics imply smp_mb and all of them have atomic ops in
        switch_mm() for mm_cpumask(), and on x86 the CR3 load acts as a full
        barrier.
      
      * From all weakly ordered machines, only ARM64 and PPC can do RELEASE,
        the rest does indeed do smp_mb(), so there the spin_unlock() is a full
        barrier and we're good.
      
      * ARM64 has a very heavy barrier in switch_to(), which suffices.
      
      * PPC just removed its barrier from switch_to(), but appears to be
        talking about adding something to switch_mm(). So add a
        smp_mb__after_unlock_lock() for now, until this is settled on the PPC
        side.
      
      Changes since v3:
      - Properly document the memory barriers provided by each architecture.
      
      Changes since v2:
      - Address comments from Peter Zijlstra,
      - Add smp_mb__after_unlock_lock() after finish_lock_switch() in
        finish_task_switch() to add the memory barrier we need after storing
        to rq->curr. This is much simpler than the previous approach relying
        on atomic_dec_and_test() in mmdrop(), which actually added a memory
        barrier in the common case of switching between userspace processes.
      - Return -EINVAL when MEMBARRIER_CMD_SHARED is used on a nohz_full
        kernel, rather than having the whole membarrier system call returning
        -ENOSYS. Indeed, CMD_PRIVATE_EXPEDITED is compatible with nohz_full.
        Adapt the CMD_QUERY mask accordingly.
      
      Changes since v1:
      - move membarrier code under kernel/sched/ because it uses the
        scheduler runqueue,
      - only add the barrier when we switch from a kernel thread. The case
        where we switch from a user-space thread is already handled by
        the atomic_dec_and_test() in mmdrop().
      - add a comment to mmdrop() documenting the requirement on the implicit
        memory barrier.
      
      CC: Peter Zijlstra <peterz@infradead.org>
      CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      CC: Boqun Feng <boqun.feng@gmail.com>
      CC: Andrew Hunter <ahh@google.com>
      CC: Maged Michael <maged.michael@gmail.com>
      CC: gromer@google.com
      CC: Avi Kivity <avi@scylladb.com>
      CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Paul Mackerras <paulus@samba.org>
      CC: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NDave Watson <davejwatson@fb.com>
      22e4ebb9
  2. 03 7月, 2017 1 次提交
    • L
      arm64: PCI: Drop DT IRQ allocation from pcibios_alloc_irq() · 769b461f
      Lorenzo Pieralisi 提交于
      With the introduction of struct pci_host_bridge.map_irq pointer it is
      possible to assign IRQs for all devices originating from a PCI host bridge
      at probe time; this is implemented through pci_assign_irq() that relies on
      the struct pci_host_bridge.map_irq pointer to map IRQ for a given device.
      
      The benefits this brings are twofold:
      
        - the IRQ for a device is assigned once at probe time
        - the IRQ assignment works also for hotplugged devices
      
      With all DT based PCI host bridges converted to the struct
      pci_host_bridge.{map/swizzle}_irq hooks mechanism the DT IRQ allocation in
      ARM64 pcibios_alloc_irq() is now redundant and can be removed.
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      769b461f
  3. 30 6月, 2017 4 次提交
  4. 29 6月, 2017 9 次提交
  5. 24 6月, 2017 2 次提交
    • M
      arm64: ftrace: fix !CONFIG_ARM64_MODULE_PLTS kernels · 8486e54d
      Mark Rutland 提交于
      When a kernel is built without CONFIG_ARM64_MODULE_PLTS, we don't
      generate the expected branch instruction in ftrace_make_nop(). This
      means we pass zero (rather than a valid branch) to ftrace_modify_code()
      as the expected instruction to validate. This causes us to return
      -EINVAL to the core ftrace code for a valid case, resulting in a splat
      at boot time.
      
      This was an unintended effect of commit:
      
        68764420 ("arm64: ftrace: fix building without CONFIG_MODULES")
      
      ... which incorrectly moved the generation of the branch instruction
      into the ifdef for CONFIG_ARM64_MODULE_PLTS.
      
      This patch fixes the issue by moving the ifdef inside of the relevant
      if-else case, and always checking that the branch is in range,
      regardless of CONFIG_ARM64_MODULE_PLTS. This ensures that we generate
      the expected branch instruction, and also improves our sanity checks.
      
      For consistency, both ftrace_make_nop() and ftrace_make_call() are
      updated with this pattern.
      
      Fixes: 68764420 ("arm64: ftrace: fix building without CONFIG_MODULES")
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reported-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      8486e54d
    • D
      arm64: signal: Allow expansion of the signal frame · 33f08261
      Dave Martin 提交于
      This patch defines an extra_context signal frame record that can be
      used to describe an expanded signal frame, and modifies the context
      block allocator and signal frame setup and parsing code to create,
      populate, parse and decode this block as necessary.
      
      To avoid abuse by userspace, parse_user_sigframe() attempts to
      ensure that:
      
       * no more than one extra_context is accepted;
       * the extra context data is a sensible size, and properly placed
         and aligned.
      
      The extra_context data is required to start at the first 16-byte
      aligned address immediately after the dummy terminator record
      following extra_context in rt_sigframe.__reserved[] (as ensured
      during signal delivery).  This serves as a sanity-check that the
      signal frame has not been moved or copied without taking the extra
      data into account.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      [will: add __force annotation when casting extra_datap to __user pointer]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      33f08261
  6. 22 6月, 2017 4 次提交
    • M
      arm64: dump cpu_hwcaps at panic time · 8effeaaf
      Mark Rutland 提交于
      When debugging a kernel panic(), it can be useful to know which CPU
      features have been detected by the kernel, as some code paths can depend
      on these (and may have been patched at runtime).
      
      This patch adds a notifier to dump the detected CPU caps (as a hex
      string) at panic(), when we log other information useful for debugging.
      On a Juno R1 system running v4.12-rc5, this looks like:
      
      [  615.431249] Kernel panic - not syncing: Fatal exception in interrupt
      [  615.437609] SMP: stopping secondary CPUs
      [  615.441872] Kernel Offset: disabled
      [  615.445372] CPU features: 0x02086
      [  615.448522] Memory Limit: none
      
      A developer can decode this by looking at the corresponding
      <asm/cpucaps.h> bits. For example, the above decodes as:
      
      * bit  1: ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE
      * bit  2: ARM64_WORKAROUND_845719
      * bit  7: ARM64_WORKAROUND_834220
      * bit 13: ARM64_HAS_32BIT_EL0
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Acked-by: NSteve Capper <steve.capper@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      8effeaaf
    • D
      arm64: ptrace: Flush user-RW TLS reg to thread_struct before reading · 936eb65c
      Dave Martin 提交于
      When reading current's user-writable TLS register (which occurs
      when dumping core for native tasks), it is possible that userspace
      has modified it since the time the task was last scheduled out.
      The new TLS register value is not guaranteed to have been written
      immediately back to thread_struct in this case.
      
      As a result, a coredump can capture stale data for this register.
      Reading the register for a stopped task via ptrace is unaffected.
      
      For native tasks, this patch explicitly flushes the TPIDR_EL0
      register back to thread_struct before dumping when operating on
      current, thus ensuring that coredump contents are up to date.  For
      compat tasks, the TLS register is not user-writable and so cannot
      be out of sync, so no flush is required in compat_tls_get().
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      936eb65c
    • D
      arm64: ptrace: Flush FPSIMD regs back to thread_struct before reading · e1d5a8fb
      Dave Martin 提交于
      When reading the FPSIMD state of current (which occurs when dumping
      core), it is possible that userspace has modified the FPSIMD
      registers since the time the task was last scheduled out.  Such
      changes are not guaranteed to be reflected immedately in
      thread_struct.
      
      As a result, a coredump can contain stale values for these
      registers.  Reading the registers of a stopped task via ptrace is
      unaffected.
      
      This patch explicitly flushes the CPU state back to thread_struct
      before dumping when operating on current, thus ensuring that
      coredump contents are up to date.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      e1d5a8fb
    • D
      arm64: ptrace: Fix VFP register dumping in compat coredumps · af66b2d8
      Dave Martin 提交于
      Currently, VFP registers are omitted from coredumps for compat
      processes, due to a bug in the REGSET_COMPAT_VFP regset
      implementation.
      
      compat_vfp_get() needs to transfer non-contiguous data from
      thread_struct.fpsimd_state, and uses put_user() to handle the
      offending trailing word (FPSCR).  This fails when copying to a
      kernel address (i.e., kbuf && !ubuf), which is what happens when
      dumping core.  As a result, the ELF coredump core code silently
      omits the NT_ARM_VFP note from the dump.
      
      It would be possible to work around this with additional special
      case code for the put_user(), but since user_regset_copyout() is
      explicitly designed to handle this scenario it is cleaner to port
      the put_user() to a user_regset_copyout() call, which this patch
      does.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      af66b2d8
  7. 21 6月, 2017 1 次提交
    • J
      time: Clean up CLOCK_MONOTONIC_RAW time handling · fc6eead7
      John Stultz 提交于
      Now that we fixed the sub-ns handling for CLOCK_MONOTONIC_RAW,
      remove the duplicitive tk->raw_time.tv_nsec, which can be
      stored in tk->tkr_raw.xtime_nsec (similarly to how its handled
      for monotonic time).
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Daniel Mentz <danielmentz@google.com>
      Tested-by: NDaniel Mentz <danielmentz@google.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      fc6eead7
  8. 20 6月, 2017 5 次提交
    • D
      arm64: signal: factor out signal frame record allocation · bb4322f7
      Dave Martin 提交于
      This patch factors out the allocator for signal frame optional
      records into a separate function, to ensure consistency and
      facilitate later expansion.
      
      No overrun checking is currently done, because the allocation is in
      user memory and anyway the kernel never tries to allocate enough
      space in the signal frame yet for an overrun to occur.  This
      behaviour will be refined in future patches.
      
      The approach taken in this patch to allocation of the terminator
      record is not very clean: this will also be replaced in subsequent
      patches.
      
      For future extension, a comment is added in sigcontext.h
      documenting the current static allocations in __reserved[].  This
      will be important for determining under what circumstances
      userspace may or may not see an expanded signal frame.
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      bb4322f7
    • D
      arm64: signal: factor frame layout and population into separate passes · bb4891a6
      Dave Martin 提交于
      In preparation for expanding the signal frame, this patch refactors
      the signal frame setup code in setup_sigframe() into two separate
      passes.
      
      The first pass, setup_sigframe_layout(), determines the size of the
      signal frame and its internal layout, including the presence and
      location of optional records.  The resulting knowledge is used to
      allocate and locate the user stack space required for the signal
      frame and to determine which optional records to include.
      
      The second pass, setup_sigframe(), is called once the stack frame
      is allocated in order to populate it with the necessary context
      information.
      
      As a result of these changes, it becomes more natural to represent
      locations in the signal frame by a base pointer and an offset,
      since the absolute address of each location is not known during the
      layout pass.  To be more consistent with this logic,
      parse_user_sigframe() is refactored to describe signal frame
      locations in a similar way.
      
      This change has no effect on the signal ABI, but will make it
      easier to expand the signal frame in future patches.
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      bb4891a6
    • D
      arm64: signal: Refactor sigcontext parsing in rt_sigreturn · 47ccb028
      Dave Martin 提交于
      Currently, rt_sigreturn does very limited checking on the
      sigcontext coming from userspace.
      
      Future additions to the sigcontext data will increase the potential
      for surprises.  Also, it is not clear whether the sigcontext
      extension records are supposed to occur in a particular order.
      
      To allow the parsing code to be extended more easily, this patch
      factors out the sigcontext parsing into a separate function, and
      adds extra checks to validate the well-formedness of the sigcontext
      structure.
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      47ccb028
    • D
      arm64: signal: split frame link record from sigcontext structure · 20987de3
      Dave Martin 提交于
      In order to be able to increase the amount of the data currently
      written to the __reserved[] array in the signal frame, it is
      necessary to overwrite the locations currently occupied by the
      {fp,lr} frame link record pushed at the top of the signal stack.
      
      In order for this to work, this patch detaches the frame link
      record from struct rt_sigframe and places it separately at the top
      of the signal stack.  This will allow subsequent patches to insert
      data between it and __reserved[].
      
      This change relies on the non-ABI status of the placement of the
      frame record with respect to struct sigframe: this status is
      undocumented, but the placement is not declared or described in the
      user headers, and known unwinder implementations (libgcc,
      libunwind, gdb) appear not to rely on it.
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      20987de3
    • W
      arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW · dbb236c1
      Will Deacon 提交于
      Recently vDSO support for CLOCK_MONOTONIC_RAW was added in
      49eea433 ("arm64: Add support for CLOCK_MONOTONIC_RAW in
      clock_gettime() vDSO"). Noticing that the core timekeeping code
      never set tkr_raw.xtime_nsec, the vDSO implementation didn't
      bother exposing it via the data page and instead took the
      unshifted tk->raw_time.tv_nsec value which was then immediately
      shifted left in the vDSO code.
      
      Unfortunately, by accellerating the MONOTONIC_RAW clockid, it
      uncovered potential 1ns time inconsistencies caused by the
      timekeeping core not handing sub-ns resolution.
      
      Now that the core code has been fixed and is actually setting
      tkr_raw.xtime_nsec, we need to take that into account in the
      vDSO by adding it to the shifted raw_time value, in order to
      fix the user-visible inconsistency. Rather than do that at each
      use (and expand the data page in the process), instead perform
      the shift/addition operation when populating the data page and
      remove the shift from the vDSO code entirely.
      
      [jstultz: minor whitespace tweak, tried to improve commit
       message to make it more clear this fixes a regression]
      Reported-by: NJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Tested-by: NDaniel Mentz <danielmentz@google.com>
      Acked-by: NKevin Brodsky <kevin.brodsky@arm.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: "stable #4 . 8+" <stable@vger.kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Link: http://lkml.kernel.org/r/1496965462-20003-4-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      dbb236c1
  9. 15 6月, 2017 2 次提交
  10. 14 6月, 2017 1 次提交
  11. 12 6月, 2017 1 次提交
    • W
      arm64: ftrace: fix building without CONFIG_MODULES · 68764420
      Will Deacon 提交于
      When CONFIG_MODULES is disabled, we cannot dereference a module pointer:
      
      arch/arm64/kernel/ftrace.c: In function 'ftrace_make_call':
      arch/arm64/kernel/ftrace.c:107:36: error: dereferencing pointer to incomplete type 'struct module'
         trampoline = (unsigned long *)mod->arch.ftrace_trampoline;
      
      Also, the within_module() function is not defined:
      
      arch/arm64/kernel/ftrace.c: In function 'ftrace_make_nop':
      arch/arm64/kernel/ftrace.c:171:8: error: implicit declaration of function 'within_module'; did you mean 'init_module'? [-Werror=implicit-function-declaration]
      
      This addresses both by adding replacing the IS_ENABLED(CONFIG_ARM64_MODULE_PLTS)
      checks with #ifdef versions.
      
      Fixes: e71a4e1b ("arm64: ftrace: add support for far branches to dynamic ftrace")
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      68764420
  12. 07 6月, 2017 3 次提交
  13. 05 6月, 2017 2 次提交
    • A
      efi/arm: Enable DMI/SMBIOS · bb817bef
      Ard Biesheuvel 提交于
      Wire up the existing arm64 support for SMBIOS tables (aka DMI) for ARM as
      well, by moving the arm64 init code to drivers/firmware/efi/arm-runtime.c
      (which is shared between ARM and arm64), and adding a asm/dmi.h header to
      ARM that defines the mapping routines for the firmware tables.
      
      This allows userspace to access these tables to discover system information
      exposed by the firmware. It also sets the hardware name used in crash
      dumps, e.g.:
      
        Unable to handle kernel NULL pointer dereference at virtual address 00000000
        pgd = ed3c0000
        [00000000] *pgd=bf1f3835
        Internal error: Oops: 817 [#1] SMP THUMB2
        Modules linked in:
        CPU: 0 PID: 759 Comm: bash Not tainted 4.10.0-09601-g0e8f38792120-dirty #112
        Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
        ^^^
      
      NOTE: This does *NOT* enable or encourage the use of DMI quirks, i.e., the
            the practice of identifying the platform via DMI to decide whether
            certain workarounds for buggy hardware and/or firmware need to be
            enabled. This would require the DMI subsystem to be enabled much
            earlier than we do on ARM, which is non-trivial.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170602135207.21708-14-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bb817bef
    • W
      arm64: cpufeature: Fix CPU_OUT_OF_SPEC taint for uniform systems · 8dd0ee65
      Will Deacon 提交于
      Commit 3fde2999 ("arm64: cpufeature: Don't dump useless backtrace on
      CPU_OUT_OF_SPEC") changed the cpufeature detection code to use add_taint
      instead of WARN_TAINT_ONCE when detecting a heterogeneous system with
      mismatched feature support. Unfortunately, this resulted in all systems
      getting the taint, regardless of any feature mismatch.
      
      This patch fixes the problem by conditionalising the taint on detecting
      a feature mismatch.
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Reported-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      8dd0ee65
  14. 03 6月, 2017 4 次提交