1. 25 10月, 2017 1 次提交
  2. 24 10月, 2017 1 次提交
    • M
      arm64: Avoid aligning normal memory pointers in __memcpy_{to,from}io · 9ca255bf
      Mark Salyzyn 提交于
      __memcpy_{to,from}io fall back to byte-at-a-time copying if both the
      source and destination pointers are not 8-byte aligned. Since one of the
      pointers always points at normal memory, this is unnecessary and
      detrimental to performance, so only do byte copying until we hit an 8-byte
      boundary for the device pointer.
      
      This change was motivated by performance issues in the pstore driver.
      On a test platform, measuring probe time for pstore, console buffer
      size of 1/4MB and pmsg of 1/2MB, was in the 90-107ms region. Change
      managed to reduce it to 10-25ms, an improvement in boot time.
      
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Anton Vorontsov <anton@enomsg.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Anton Vorontsov <anton@enomsg.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Signed-off-by: NMark Salyzyn <salyzyn@android.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      9ca255bf
  3. 20 10月, 2017 1 次提交
    • S
      arm64: Fix the feature type for ID register fields · 5bdecb79
      Suzuki K Poulose 提交于
      Now that the ARM ARM clearly specifies the rules for inferring
      the values of the ID register fields, fix the types of the
      feature bits we have in the kernel.
      
      As per ARM ARM DDI0487B.b, section D10.1.4 "Principles of the
      ID scheme for fields in ID registers" lists the registers to
      which the scheme applies along with the exceptions.
      
      This patch changes the relevant feature bits from FTR_EXACT
      to FTR_LOWER_SAFE to select the safer value. This will enable
      an older kernel running on a new CPU detect the safer option
      rather than completely disabling the feature.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Martin <dave.martin@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      5bdecb79
  4. 18 10月, 2017 1 次提交
  5. 11 10月, 2017 1 次提交
    • S
      arm64: Expose support for optional ARMv8-A features · f5e035f8
      Suzuki K Poulose 提交于
      ARMv8-A adds a few optional features for ARMv8.2 and ARMv8.3.
      Expose them to the userspace via HWCAPs and mrs emulation.
      
      SHA2-512  - Instruction support for SHA512 Hash algorithm (e.g SHA512H,
      	    SHA512H2, SHA512U0, SHA512SU1)
      SHA3 	  - SHA3 crypto instructions (EOR3, RAX1, XAR, BCAX).
      SM3	  - Instruction support for Chinese cryptography algorithm SM3
      SM4 	  - Instruction support for Chinese cryptography algorithm SM4
      DP	  - Dot Product instructions (UDOT, SDOT).
      
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Dave Martin <dave.martin@arm.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      f5e035f8
  6. 04 10月, 2017 1 次提交
    • M
      arm64: consistently log boot/secondary CPU IDs · ccaac162
      Mark Rutland 提交于
      Currently we inconsistently log identifying information for the boot CPU
      and secondary CPUs. For the boot CPU, we log the MIDR and MPIDR across
      separate messages, whereas for the secondary CPUs we only log the MIDR.
      
      In some cases, it would be useful to know the MPIDR of secondary CPUs,
      and it would be nice for these messages to be consistent.
      
      This patch ensures that in the primary and secondary boot paths, we log
      both the MPIDR and MIDR in a single message, with a consistent format.
      the MPIDR is consistently padded to 10 hex characters to cover Aff3 in
      bits 39:32, so that IDs can be compared easily.
      
      The newly redundant message in setup_arch() is removed.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Al Stone <ahs3@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      [will: added '0x' prefixes consistently]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      ccaac162
  7. 02 10月, 2017 2 次提交
    • M
      arm64: remove unneeded copy to init_utsname()->machine · c2f0b54f
      Masahiro Yamada 提交于
      As you see in init/version.c, init_uts_ns.name.machine is initially
      set to UTS_MACHINE.  There is no point to copy the same string.
      
      I dug the git history to figure out why this line is here.  My best
      guess is like this:
      
       - This line has been around here since the initial support of arm64
         by commit 9703d9d7 ("arm64: Kernel booting and initialisation").
         If ARCH (=arm64) and UTS_MACHINE (=aarch64) do not match,
         arch/$(ARCH)/Makefile is supposed to override UTS_MACHINE, but the
         initial version of arch/arm64/Makefile missed to do that.  Instead,
         the boot code copied "aarch64" to init_utsname()->machine.
      
       - Commit 94ed1f2c ("arm64: setup: report ELF_PLATFORM as the
         machine for utsname") replaced "aarch64" with ELF_PLATFORM to
         make "uname" to reflect the endianness.
      
       - ELF_PLATFORM does not help to provide the UTS machine name to rpm
         target, so commit cfa88c79 ("arm64: Set UTS_MACHINE in the
         Makefile") fixed it.  The commit simply replaced ELF_PLATFORM with
         UTS_MACHINE, but missed the fact the string copy itself is no longer
         needed.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      c2f0b54f
    • Y
      arm64: move TASK_* definitions to <asm/processor.h> · eef94a3d
      Yury Norov 提交于
      ILP32 series [1] introduces the dependency on <asm/is_compat.h> for
      TASK_SIZE macro. Which in turn requires <asm/thread_info.h>, and
      <asm/thread_info.h> include <asm/memory.h>, giving a circular dependency,
      because TASK_SIZE is currently located in <asm/memory.h>.
      
      In other architectures, TASK_SIZE is defined in <asm/processor.h>, and
      moving TASK_SIZE there fixes the problem.
      
      Discussion: https://patchwork.kernel.org/patch/9929107/
      
      [1] https://github.com/norov/linux/tree/ilp32-next
      
      CC: Will Deacon <will.deacon@arm.com>
      CC: Laura Abbott <labbott@redhat.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Suggested-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NYury Norov <ynorov@caviumnetworks.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      eef94a3d
  8. 27 9月, 2017 1 次提交
  9. 18 9月, 2017 2 次提交
  10. 14 9月, 2017 1 次提交
  11. 09 9月, 2017 1 次提交
  12. 23 8月, 2017 5 次提交
    • Y
      arm64: cleanup {COMPAT_,}SET_PERSONALITY() macro · d1be5c99
      Yury Norov 提交于
      There is some work that should be done after setting the personality.
      Currently it's done in the macro, which is not the best idea.
      
      In this patch new arch_setup_new_exec() routine is introduced, and all
      setup code is moved there, as suggested by Catalin:
      https://lkml.org/lkml/2017/8/4/494
      
      Cc: Pratyush Anand <panand@redhat.com>
      Signed-off-by: NYury Norov <ynorov@caviumnetworks.com>
      [catalin.marinas@arm.com: comments changed or removed]
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      d1be5c99
    • C
      arm64: kaslr: Adjust the offset to avoid Image across alignment boundary · a067d94d
      Catalin Marinas 提交于
      With 16KB pages and a kernel Image larger than 16MB, the current
      kaslr_early_init() logic for avoiding mappings across swapper table
      boundaries fails since increasing the offset by kimg_sz just moves the
      problem to the next boundary.
      
      This patch rounds the offset down to (1 << SWAPPER_TABLE_SHIFT) if the
      Image crosses a PMD_SIZE boundary.
      
      Fixes: afd0e5a8 ("arm64: kaslr: Fix up the kernel image alignment")
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      a067d94d
    • A
      arm64: kaslr: ignore modulo offset when validating virtual displacement · 4a23e56a
      Ard Biesheuvel 提交于
      In the KASLR setup routine, we ensure that the early virtual mapping
      of the kernel image does not cover more than a single table entry at
      the level above the swapper block level, so that the assembler routines
      involved in setting up this mapping can remain simple.
      
      In this calculation we add the proposed KASLR offset to the values of
      the _text and _end markers, and reject it if they would end up falling
      in different swapper table sized windows.
      
      However, when taking the addresses of _text and _end, the modulo offset
      (the physical displacement modulo 2 MB) is already accounted for, and
      so adding it again results in incorrect results. So disregard the modulo
      offset from the calculation.
      
      Fixes: 08cdac61 ("arm64: relocatable: deal with physically misaligned ...")
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Tested-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      4a23e56a
    • D
      arm64: fpsimd: Prevent registers leaking across exec · 09662210
      Dave Martin 提交于
      There are some tricky dependencies between the different stages of
      flushing the FPSIMD register state during exec, and these can race
      with context switch in ways that can cause the old task's regs to
      leak across.  In particular, a context switch during the memset() can
      cause some of the task's old FPSIMD registers to reappear.
      
      Disabling preemption for this small window would be no big deal for
      performance: preemption is already disabled for similar scenarios
      like updating the FPSIMD registers in sigreturn.
      
      So, instead of rearranging things in ways that might swap existing
      subtle bugs for new ones, this patch just disables preemption
      around the FPSIMD state flushing so that races of this type can't
      occur here.  This brings fpsimd_flush_thread() into line with other
      code paths.
      
      Cc: stable@vger.kernel.org
      Fixes: 674c242c ("arm64: flush FP/SIMD state correctly after execve()")
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      09662210
    • Y
      arm64: introduce separated bits for mm_context_t flags · 5ce93ab6
      Yury Norov 提交于
      Currently mm->context.flags field uses thread_info flags which is not
      the best idea for many reasons. For example, mm_context_t doesn't need
      most of thread_info flags. And it would be difficult to add new mm-related
      flag if needed because it may easily interfere with TIF ones.
      
      To deal with it, the new MMCF_AARCH32 flag is introduced for
      mm_context_t->flags, where MMCF prefix stands for mm_context_t flags.
      Also, mm_context_t flag doesn't require atomicity and ordering of the
      access, so using set/clear_bit() is replaced with simple masks.
      Signed-off-by: NYury Norov <ynorov@caviumnetworks.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      5ce93ab6
  13. 22 8月, 2017 1 次提交
    • H
      arm64: kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores · a88ce63b
      Hoeun Ryu 提交于
       Commit 0ee59413 : (x86/panic: replace smp_send_stop() with kdump friendly
      version in panic path) introduced crash_smp_send_stop() which is a weak
      function and can be overridden by architecture codes to fix the side effect
      caused by commit f06e5153 : (kernel/panic.c: add "crash_kexec_post_
      notifiers" option).
      
       ARM64 architecture uses the weak version function and the problem is that
      the weak function simply calls smp_send_stop() which makes other CPUs
      offline and takes away the chance to save crash information for nonpanic
      CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
      option is enabled.
      
       Calling smp_send_crash_stop() in machine_crash_shutdown() is useless
      because all nonpanic CPUs are already offline by smp_send_stop() in this
      case and smp_send_crash_stop() only works against online CPUs.
      
       The result is that secondary CPUs registers are not saved by
      crash_save_cpu() and the vmcore file misreports these CPUs as being
      offline.
      
       crash_smp_send_stop() is implemented to fix this problem by replacing the
      existing smp_send_crash_stop() and adding a check for multiple calling to
      the function. The function (strong symbol version) saves crash information
      for nonpanic CPUs and machine_crash_shutdown() tries to save crash
      information for nonpanic CPUs only when crash_kexec_post_notifiers kernel
      option is disabled.
      
      * crash_kexec_post_notifiers : false
      
        panic()
          __crash_kexec()
            machine_crash_shutdown()
              crash_smp_send_stop()    <= save crash dump for nonpanic cores
      
      * crash_kexec_post_notifiers : true
      
        panic()
          crash_smp_send_stop()        <= save crash dump for nonpanic cores
          __crash_kexec()
            machine_crash_shutdown()
              crash_smp_send_stop()    <= just return.
      Signed-off-by: NHoeun Ryu <hoeun.ryu@gmail.com>
      Reviewed-by: NJames Morse <james.morse@arm.com>
      Tested-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      a88ce63b
  14. 21 8月, 2017 1 次提交
    • C
      arm64: Move PTE_RDONLY bit handling out of set_pte_at() · 73e86cb0
      Catalin Marinas 提交于
      Currently PTE_RDONLY is treated as a hardware only bit and not handled
      by the pte_mkwrite(), pte_wrprotect() or the user PAGE_* definitions.
      The set_pte_at() function is responsible for setting this bit based on
      the write permission or dirty state. This patch moves the PTE_RDONLY
      handling out of set_pte_at into the pte_mkwrite()/pte_wrprotect()
      functions. The PAGE_* definitions to need to be updated to explicitly
      include PTE_RDONLY when !PTE_WRITE.
      
      The patch also removes the redundant PAGE_COPY(_EXEC) definitions as
      they are identical to the corresponding PAGE_READONLY(_EXEC).
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      73e86cb0
  15. 19 8月, 2017 1 次提交
  16. 17 8月, 2017 1 次提交
    • M
      membarrier: Provide expedited private command · 22e4ebb9
      Mathieu Desnoyers 提交于
      Implement MEMBARRIER_CMD_PRIVATE_EXPEDITED with IPIs using cpumask built
      from all runqueues for which current thread's mm is the same as the
      thread calling sys_membarrier. It executes faster than the non-expedited
      variant (no blocking). It also works on NOHZ_FULL configurations.
      
      Scheduler-wise, it requires a memory barrier before and after context
      switching between processes (which have different mm). The memory
      barrier before context switch is already present. For the barrier after
      context switch:
      
      * Our TSO archs can do RELEASE without being a full barrier. Look at
        x86 spin_unlock() being a regular STORE for example.  But for those
        archs, all atomics imply smp_mb and all of them have atomic ops in
        switch_mm() for mm_cpumask(), and on x86 the CR3 load acts as a full
        barrier.
      
      * From all weakly ordered machines, only ARM64 and PPC can do RELEASE,
        the rest does indeed do smp_mb(), so there the spin_unlock() is a full
        barrier and we're good.
      
      * ARM64 has a very heavy barrier in switch_to(), which suffices.
      
      * PPC just removed its barrier from switch_to(), but appears to be
        talking about adding something to switch_mm(). So add a
        smp_mb__after_unlock_lock() for now, until this is settled on the PPC
        side.
      
      Changes since v3:
      - Properly document the memory barriers provided by each architecture.
      
      Changes since v2:
      - Address comments from Peter Zijlstra,
      - Add smp_mb__after_unlock_lock() after finish_lock_switch() in
        finish_task_switch() to add the memory barrier we need after storing
        to rq->curr. This is much simpler than the previous approach relying
        on atomic_dec_and_test() in mmdrop(), which actually added a memory
        barrier in the common case of switching between userspace processes.
      - Return -EINVAL when MEMBARRIER_CMD_SHARED is used on a nohz_full
        kernel, rather than having the whole membarrier system call returning
        -ENOSYS. Indeed, CMD_PRIVATE_EXPEDITED is compatible with nohz_full.
        Adapt the CMD_QUERY mask accordingly.
      
      Changes since v1:
      - move membarrier code under kernel/sched/ because it uses the
        scheduler runqueue,
      - only add the barrier when we switch from a kernel thread. The case
        where we switch from a user-space thread is already handled by
        the atomic_dec_and_test() in mmdrop().
      - add a comment to mmdrop() documenting the requirement on the implicit
        memory barrier.
      
      CC: Peter Zijlstra <peterz@infradead.org>
      CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      CC: Boqun Feng <boqun.feng@gmail.com>
      CC: Andrew Hunter <ahh@google.com>
      CC: Maged Michael <maged.michael@gmail.com>
      CC: gromer@google.com
      CC: Avi Kivity <avi@scylladb.com>
      CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Paul Mackerras <paulus@samba.org>
      CC: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NDave Watson <davejwatson@fb.com>
      22e4ebb9
  17. 16 8月, 2017 9 次提交
    • M
      arm64: add VMAP_STACK overflow detection · 872d8327
      Mark Rutland 提交于
      This patch adds stack overflow detection to arm64, usable when vmap'd stacks
      are in use.
      
      Overflow is detected in a small preamble executed for each exception entry,
      which checks whether there is enough space on the current stack for the general
      purpose registers to be saved. If there is not enough space, the overflow
      handler is invoked on a per-cpu overflow stack. This approach preserves the
      original exception information in ESR_EL1 (and where appropriate, FAR_EL1).
      
      Task and IRQ stacks are aligned to double their size, enabling overflow to be
      detected with a single bit test. For example, a 16K stack is aligned to 32K,
      ensuring that bit 14 of the SP must be zero. On an overflow (or underflow),
      this bit is flipped. Thus, overflow (of less than the size of the stack) can be
      detected by testing whether this bit is set.
      
      The overflow check is performed before any attempt is made to access the
      stack, avoiding recursive faults (and the loss of exception information
      these would entail). As logical operations cannot be performed on the SP
      directly, the SP is temporarily swapped with a general purpose register
      using arithmetic operations to enable the test to be performed.
      
      This gives us a useful error message on stack overflow, as can be trigger with
      the LKDTM overflow test:
      
      [  305.388749] lkdtm: Performing direct entry OVERFLOW
      [  305.395444] Insufficient stack space to handle exception!
      [  305.395482] ESR: 0x96000047 -- DABT (current EL)
      [  305.399890] FAR: 0xffff00000a5e7f30
      [  305.401315] Task stack:     [0xffff00000a5e8000..0xffff00000a5ec000]
      [  305.403815] IRQ stack:      [0xffff000008000000..0xffff000008004000]
      [  305.407035] Overflow stack: [0xffff80003efce4e0..0xffff80003efcf4e0]
      [  305.409622] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5
      [  305.412785] Hardware name: linux,dummy-virt (DT)
      [  305.415756] task: ffff80003d051c00 task.stack: ffff00000a5e8000
      [  305.419221] PC is at recursive_loop+0x10/0x48
      [  305.421637] LR is at recursive_loop+0x38/0x48
      [  305.423768] pc : [<ffff00000859f330>] lr : [<ffff00000859f358>] pstate: 40000145
      [  305.428020] sp : ffff00000a5e7f50
      [  305.430469] x29: ffff00000a5e8350 x28: ffff80003d051c00
      [  305.433191] x27: ffff000008981000 x26: ffff000008f80400
      [  305.439012] x25: ffff00000a5ebeb8 x24: ffff00000a5ebeb8
      [  305.440369] x23: ffff000008f80138 x22: 0000000000000009
      [  305.442241] x21: ffff80003ce65000 x20: ffff000008f80188
      [  305.444552] x19: 0000000000000013 x18: 0000000000000006
      [  305.446032] x17: 0000ffffa2601280 x16: ffff0000081fe0b8
      [  305.448252] x15: ffff000008ff546d x14: 000000000047a4c8
      [  305.450246] x13: ffff000008ff7872 x12: 0000000005f5e0ff
      [  305.452953] x11: ffff000008ed2548 x10: 000000000005ee8d
      [  305.454824] x9 : ffff000008545380 x8 : ffff00000a5e8770
      [  305.457105] x7 : 1313131313131313 x6 : 00000000000000e1
      [  305.459285] x5 : 0000000000000000 x4 : 0000000000000000
      [  305.461781] x3 : 0000000000000000 x2 : 0000000000000400
      [  305.465119] x1 : 0000000000000013 x0 : 0000000000000012
      [  305.467724] Kernel panic - not syncing: kernel stack overflow
      [  305.470561] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5
      [  305.473325] Hardware name: linux,dummy-virt (DT)
      [  305.475070] Call trace:
      [  305.476116] [<ffff000008088ad8>] dump_backtrace+0x0/0x378
      [  305.478991] [<ffff000008088e64>] show_stack+0x14/0x20
      [  305.481237] [<ffff00000895a178>] dump_stack+0x98/0xb8
      [  305.483294] [<ffff0000080c3288>] panic+0x118/0x280
      [  305.485673] [<ffff0000080c2e9c>] nmi_panic+0x6c/0x70
      [  305.486216] [<ffff000008089710>] handle_bad_stack+0x118/0x128
      [  305.486612] Exception stack(0xffff80003efcf3a0 to 0xffff80003efcf4e0)
      [  305.487334] f3a0: 0000000000000012 0000000000000013 0000000000000400 0000000000000000
      [  305.488025] f3c0: 0000000000000000 0000000000000000 00000000000000e1 1313131313131313
      [  305.488908] f3e0: ffff00000a5e8770 ffff000008545380 000000000005ee8d ffff000008ed2548
      [  305.489403] f400: 0000000005f5e0ff ffff000008ff7872 000000000047a4c8 ffff000008ff546d
      [  305.489759] f420: ffff0000081fe0b8 0000ffffa2601280 0000000000000006 0000000000000013
      [  305.490256] f440: ffff000008f80188 ffff80003ce65000 0000000000000009 ffff000008f80138
      [  305.490683] f460: ffff00000a5ebeb8 ffff00000a5ebeb8 ffff000008f80400 ffff000008981000
      [  305.491051] f480: ffff80003d051c00 ffff00000a5e8350 ffff00000859f358 ffff00000a5e7f50
      [  305.491444] f4a0: ffff00000859f330 0000000040000145 0000000000000000 0000000000000000
      [  305.492008] f4c0: 0001000000000000 0000000000000000 ffff00000a5e8350 ffff00000859f330
      [  305.493063] [<ffff00000808205c>] __bad_stack+0x88/0x8c
      [  305.493396] [<ffff00000859f330>] recursive_loop+0x10/0x48
      [  305.493731] [<ffff00000859f358>] recursive_loop+0x38/0x48
      [  305.494088] [<ffff00000859f358>] recursive_loop+0x38/0x48
      [  305.494425] [<ffff00000859f358>] recursive_loop+0x38/0x48
      [  305.494649] [<ffff00000859f358>] recursive_loop+0x38/0x48
      [  305.494898] [<ffff00000859f358>] recursive_loop+0x38/0x48
      [  305.495205] [<ffff00000859f358>] recursive_loop+0x38/0x48
      [  305.495453] [<ffff00000859f358>] recursive_loop+0x38/0x48
      [  305.495708] [<ffff00000859f358>] recursive_loop+0x38/0x48
      [  305.496000] [<ffff00000859f358>] recursive_loop+0x38/0x48
      [  305.496302] [<ffff00000859f358>] recursive_loop+0x38/0x48
      [  305.496644] [<ffff00000859f358>] recursive_loop+0x38/0x48
      [  305.496894] [<ffff00000859f358>] recursive_loop+0x38/0x48
      [  305.497138] [<ffff00000859f358>] recursive_loop+0x38/0x48
      [  305.497325] [<ffff00000859f3dc>] lkdtm_OVERFLOW+0x14/0x20
      [  305.497506] [<ffff00000859f314>] lkdtm_do_action+0x1c/0x28
      [  305.497786] [<ffff00000859f178>] direct_entry+0xe0/0x170
      [  305.498095] [<ffff000008345568>] full_proxy_write+0x60/0xa8
      [  305.498387] [<ffff0000081fb7f4>] __vfs_write+0x1c/0x128
      [  305.498679] [<ffff0000081fcc68>] vfs_write+0xa0/0x1b0
      [  305.498926] [<ffff0000081fe0fc>] SyS_write+0x44/0xa0
      [  305.499182] Exception stack(0xffff00000a5ebec0 to 0xffff00000a5ec000)
      [  305.499429] bec0: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0
      [  305.499674] bee0: 574f4c465245564f 0000000000000000 0000000000000000 8000000080808080
      [  305.499904] bf00: 0000000000000040 0000000000000038 fefefeff1b4bc2ff 7f7f7f7f7f7fff7f
      [  305.500189] bf20: 0101010101010101 0000000000000000 000000000047a4c8 0000000000000038
      [  305.500712] bf40: 0000000000000000 0000ffffa2601280 0000ffffc63f6068 00000000004b5000
      [  305.501241] bf60: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0
      [  305.501791] bf80: 0000000000000020 0000000000000000 00000000004b5000 000000001c4cc458
      [  305.502314] bfa0: 0000000000000000 0000ffffc63f7950 000000000040a3c4 0000ffffc63f70e0
      [  305.502762] bfc0: 0000ffffa2601268 0000000080000000 0000000000000001 0000000000000040
      [  305.503207] bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      [  305.503680] [<ffff000008082fb0>] el0_svc_naked+0x24/0x28
      [  305.504720] Kernel Offset: disabled
      [  305.505189] CPU features: 0x002082
      [  305.505473] Memory Limit: none
      [  305.506181] ---[ end Kernel panic - not syncing: kernel stack overflow
      
      This patch was co-authored by Ard Biesheuvel and Mark Rutland.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      872d8327
    • M
      arm64: add on_accessible_stack() · 12964443
      Mark Rutland 提交于
      Both unwind_frame() and dump_backtrace() try to check whether a stack
      address is sane to access, with very similar logic. Both will need
      updating in order to handle overflow stacks.
      
      Factor out this logic into a helper, so that we can avoid further
      duplication when we add overflow stacks.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      12964443
    • M
      arm64: add basic VMAP_STACK support · e3067861
      Mark Rutland 提交于
      This patch enables arm64 to be built with vmap'd task and IRQ stacks.
      
      As vmap'd stacks are mapped at page granularity, stacks must be a multiple of
      PAGE_SIZE. This means that a 64K page kernel must use stacks of at least 64K in
      size.
      
      To minimize the increase in Image size, IRQ stacks are dynamically allocated at
      boot time, rather than embedding the boot CPU's IRQ stack in the kernel image.
      
      This patch was co-authored by Ard Biesheuvel and Mark Rutland.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      e3067861
    • M
      arm64: use an irq stack pointer · f60fe78f
      Mark Rutland 提交于
      We allocate our IRQ stacks using a percpu array. This allows us to generate our
      IRQ stack pointers with adr_this_cpu, but bloats the kernel Image with the boot
      CPU's IRQ stack. Additionally, these are packed with other percpu variables,
      and aren't guaranteed to have guard pages.
      
      When we enable VMAP_STACK we'll want to vmap our IRQ stacks also, in order to
      provide guard pages and to permit more stringent alignment requirements. Doing
      so will require that we use a percpu pointer to each IRQ stack, rather than
      allocating a percpu IRQ stack in the kernel image.
      
      This patch updates our IRQ stack code to use a percpu pointer to the base of
      each IRQ stack. This will allow us to change the way the stack is allocated
      with minimal changes elsewhere. In some cases we may try to backtrace before
      the IRQ stack pointers are initialised, so on_irq_stack() is updated to account
      for this.
      
      In testing with cyclictest, there was no measureable difference between using
      adr_this_cpu (for irq_stack) and ldr_this_cpu (for irq_stack_ptr) in the IRQ
      entry path.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      f60fe78f
    • M
      arm64: factor out entry stack manipulation · b11e5759
      Mark Rutland 提交于
      In subsequent patches, we will detect stack overflow in our exception
      entry code, by verifying the SP after it has been decremented to make
      space for the exception regs.
      
      This verification code is small, and we can minimize its impact by
      placing it directly in the vectors. To avoid redundant modification of
      the SP, we also need to move the initial decrement of the SP into the
      vectors.
      
      As a preparatory step, this patch introduces kernel_ventry, which
      performs this decrement, and updates the entry code accordingly.
      Subsequent patches will fold SP verification into kernel_ventry.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      [Mark: turn into prep patch, expand commit msg]
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      b11e5759
    • M
      arm64: move SEGMENT_ALIGN to <asm/memory.h> · 8018ba4e
      Mark Rutland 提交于
      Currently we define SEGMENT_ALIGN directly in our vmlinux.lds.S.
      
      This is unfortunate, as the EFI stub currently open-codes the same
      number, and in future we'll want to fiddle with this.
      
      This patch moves the definition to our <asm/memory.h>, where it can be
      used by both vmlinux.lds.S and the EFI stub code.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      8018ba4e
    • M
      arm64: clean up irq stack definitions · f60ad4ed
      Mark Rutland 提交于
      Before we add yet another stack to the kernel, it would be nice to
      ensure that we consistently organise stack definitions and related
      helper functions.
      
      This patch moves the basic IRQ stack defintions to <asm/memory.h> to
      live with their task stack counterparts. Helpers used for unwinding are
      moved into <asm/stacktrace.h>, where subsequent patches will add helpers
      for other stacks. Includes are fixed up accordingly.
      
      This patch is a pure refactoring -- there should be no functional
      changes as a result of this patch.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      f60ad4ed
    • A
      arm64: kernel: remove {THREAD,IRQ_STACK}_START_SP · 34be98f4
      Ard Biesheuvel 提交于
      For historical reasons, we leave the top 16 bytes of our task and IRQ
      stacks unused, a practice used to ensure that the SP can always be
      masked to find the base of the current stack (historically, where
      thread_info could be found).
      
      However, this is not necessary, as:
      
      * When an exception is taken from a task stack, we decrement the SP by
        S_FRAME_SIZE and stash the exception registers before we compare the
        SP against the task stack. In such cases, the SP must be at least
        S_FRAME_SIZE below the limit, and can be safely masked to determine
        whether the task stack is in use.
      
      * When transitioning to an IRQ stack, we'll place a dummy frame onto the
        IRQ stack before enabling asynchronous exceptions, or executing code
        we expect to trigger faults. Thus, if an exception is taken from the
        IRQ stack, the SP must be at least 16 bytes below the limit.
      
      * We no longer mask the SP to find the thread_info, which is now found
        via sp_el0. Note that historically, the offset was critical to ensure
        that cpu_switch_to() found the correct stack for new threads that
        hadn't yet executed ret_from_fork().
      
      Given that, this initial offset serves no purpose, and can be removed.
      This brings us in-line with other architectures (e.g. x86) which do not
      rely on this masking.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      [Mark: rebase, kill THREAD_START_SP, commit msg additions]
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      34be98f4
    • M
      arm64: remove __die()'s stack dump · c5bc503c
      Mark Rutland 提交于
      Our __die() implementation tries to dump the stack memory, in addition
      to a backtrace, which is problematic.
      
      For contemporary 16K stacks, this can be a lot of data, which can take a
      long time to dump, and can push other useful context out of the kernel's
      printk ringbuffer (and/or a user's scrollback buffer on an attached
      console).
      
      Additionally, the code implicitly assumes that the SP is on the task's
      stack, and tries to dump everything between the SP and the highest task
      stack address. When the SP points at an IRQ stack (or is corrupted),
      this makes the kernel attempt to dump vast amounts of VA space. With
      vmap'd stacks, this may result in erroneous accesses to peripherals.
      
      This patch removes the memory dump, leaving us to rely on the backtrace,
      and other means of dumping stack memory such as kdump.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      c5bc503c
  18. 11 8月, 2017 4 次提交
  19. 09 8月, 2017 5 次提交
    • A
      arm64: unwind: remove sp from struct stackframe · 31e43ad3
      Ard Biesheuvel 提交于
      The unwind code sets the sp member of struct stackframe to
      'frame pointer + 0x10' unconditionally, without regard for whether
      doing so produces a legal value. So let's simply remove it now that
      we have stopped using it anyway.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      31e43ad3
    • A
      arm64: unwind: reference pt_regs via embedded stack frame · 73267498
      Ard Biesheuvel 提交于
      As it turns out, the unwind code is slightly broken, and probably has
      been for a while. The problem is in the dumping of the exception stack,
      which is intended to dump the contents of the pt_regs struct at each
      level in the call stack where an exception was taken and routed to a
      routine marked as __exception (which means its stack frame is right
      below the pt_regs struct on the stack).
      
      'Right below the pt_regs struct' is ill defined, though: the unwind
      code assigns 'frame pointer + 0x10' to the .sp member of the stackframe
      struct at each level, and dump_backtrace() happily dereferences that as
      the pt_regs pointer when encountering an __exception routine. However,
      the actual size of the stack frame created by this routine (which could
      be one of many __exception routines we have in the kernel) is not known,
      and so frame.sp is pretty useless to figure out where struct pt_regs
      really is.
      
      So it seems the only way to ensure that we can find our struct pt_regs
      when walking the stack frames is to put it at a known fixed offset of
      the stack frame pointer that is passed to such __exception routines.
      The simplest way to do that is to put it inside pt_regs itself, which is
      the main change implemented by this patch. As a bonus, doing this allows
      us to get rid of a fair amount of cruft related to walking from one stack
      to the other, which is especially nice since we intend to introduce yet
      another stack for overflow handling once we add support for vmapped
      stacks. It also fixes an inconsistency where we only add a stack frame
      pointing to ELR_EL1 if we are executing from the IRQ stack but not when
      we are executing from the task stack.
      
      To consistly identify exceptions regs even in the presence of exceptions
      taken from entry code, we must check whether the next frame was created
      by entry text, rather than whether the current frame was crated by
      exception text.
      
      To avoid backtracing using PCs that fall in the idmap, or are controlled
      by userspace, we must explcitly zero the FP and LR in startup paths, and
      must ensure that the frame embedded in pt_regs is zeroed upon entry from
      EL0. To avoid these NULL entries showin in the backtrace, unwind_frame()
      is updated to avoid them.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      [Mark: compare current frame against .entry.text, avoid bogus PCs]
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      73267498
    • D
      arm64/vdso: Support mremap() for vDSO · 73958695
      Dmitry Safonov 提交于
      vDSO VMA address is saved in mm_context for the purpose of using
      restorer from vDSO page to return to userspace after signal handling.
      
      In Checkpoint Restore in Userspace (CRIU) project we place vDSO VMA
      on restore back to the place where it was on the dump.
      With the exception for x86 (where there is API to map vDSO with
      arch_prctl()), we move vDSO inherited from CRIU task to restoree
      position by mremap().
      
      CRIU does support arm64 architecture, but kernel doesn't update
      context.vdso pointer after mremap(). Which results in translation
      fault after signal handling on restored application:
      https://github.com/xemul/criu/issues/288
      
      Make vDSO code track the VMA address by supplying .mremap() fops
      the same way it's done for x86 and arm32 by:
      commit b059a453 ("x86/vdso: Add mremap hook to vm_special_mapping")
      commit 280e87e9 ("ARM: 8683/1: ARM32: Support mremap() for sigpage/vDSO").
      
      Cc: Russell King <rmk+kernel@armlinux.org.uk>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@virtuozzo.com>
      Cc: Christopher Covington <cov@codeaurora.org>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NDmitry Safonov <dsafonov@virtuozzo.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      73958695
    • R
      arm64: Implement pmem API support · d50e071f
      Robin Murphy 提交于
      Add a clean-to-point-of-persistence cache maintenance helper, and wire
      up the basic architectural support for the pmem driver based on it.
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      [catalin.marinas@arm.com: move arch_*_pmem() functions to arch/arm64/mm/flush.c]
      [catalin.marinas@arm.com: change dmb(sy) to dmb(osh)]
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      d50e071f
    • R
      arm64: Handle trapped DC CVAP · e1bc5d1b
      Robin Murphy 提交于
      Cache clean to PoP is subject to the same access controls as to PoC, so
      if we are trapping userspace cache maintenance with SCTLR_EL1.UCI, we
      need to be prepared to handle it. To avoid getting into complicated
      fights with binutils about ARMv8.2 options, we'll just cheat and use the
      raw SYS instruction rather than the 'proper' DC alias.
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      e1bc5d1b